Font Size: a A A

Research On The Mathematical Formula Recognition Technology For Printed Document

Posted on:2010-03-17Degree:MasterType:Thesis
Country:ChinaCandidate:F ChenFull Text:PDF
GTID:2178360278966847Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of computer technology, information source electrification is becoming a very important issue. Mathematical formula has played an important part in many Scientific and technical literatures. Owing to the difficulty of Manual input, the research of techniques of automatic input seems to be more important. The current OCR(Optical Character Recognition) technology doesn't correctly deal with mathematical formulas though the result of recognition is satisfied with the Chinese ,English characters and Symbolic figure. The reason is that mathematical formulas have a Complex two-dimensional nesting structure and meaning of the formula has diversity. This makes mathematical formula in the identification and structural analysis of a lot of difficulties. Therefore, the recognition of mathematical formula has become a research hotspot in OCR.This research is about the problem with mathematical formula recognition in print document. This includes three components: formula extraction, formula identification and formula structure reconfiguration. Because of the relatively sophisticated character recognition technology, this paper's main task is the extraction of mathematical formulas and formulas'analysis and reconstruction.In this dissertation, a thorough research on formula extraction and formula structure reconfiguration has been conducted mainly from the following two aspects to improve.First of all, a printed mathematical formula extraction method based on fuzzy c-means value algorithm considering mathematical formulas'own characteristic is proposed at the formula extraction stage. Fuzzy c-means algorithms utilize differences on line distances, width-height ratio, line density between mathematical formulas and general texts. Thus can effectively increase mathematical formula recognition rate, improve characters segmentation quality. The extraction algorithm achieved good effect during the experiments.Followed, during the formula analysis and reconstruction stage, we utilize base line to locate characters which has a central point in the same threshold, and put characters to different children nodes according to their location relationship, to construct initial structural tree. Then utilize grammar and semantic knowledge transforming the structural tree to a new tree with operator as children nodes and operands as leaves, and finally the output results will be gotten.
Keywords/Search Tags:mathematical formula recognition, formula extraction, fuzzy c-means algorithm, baseline
PDF Full Text Request
Related items