Font Size: a A A

Research And Implementation On Detection And Recognition Algorithm For Mathematics Formulas In Documents

Posted on:2017-10-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2348330485958114Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
With the rapid popularization and development of computer technology and Internet, the demand for electronic information of books and documents is increasing. The electronic information of books and documents includes not only the electronic storage but also the analysis and understanding of the content. With the development of character recognition technology, the optical character recognition has a good recognition effect on English characters and numbers in electronic books and documents. Due to the mathematical formulas in documents has complex types, changeable dimension and two-dimensional nesting structure, the accuracy of detection and recognition on mathematical formula still cannot meet the actual demand.Aiming at accurate detection and recognition for mathematics formulas in documents, this paper studies the detection of mathematics formula in different layouts, the feature extraction and recognition of numbers, Operation symbols, Greek characters and English characters in mathematical formulas. The main work of this paper is as follows:(1) Pre-processed the image of mathematical formula, including de-noising, tilt correction, thinning and eliminating thorns, which has laid the foundation of the formula symbol's segmentation and recognition.(2) Analyzed the layout structure features of the books and documents, and presented a detection method of mathematic formulas which based on projection, the method can detect the formula in documents accurately.(3) Projection is widely used in mathematical formula symbol segmentation, but the algorithm is only suitable for mathematical formula with simple structure, no subscript and no hierarchy. In order to analyze and dispose the complex mathematical formulas with two-dimensional nested structure, this paper studied and presented a kind of method based on the improved connected domain which can segment the mathematical formula symbols, the method can achieve accurate segmentation of the characters in the mathematical formula with nested structure.(4) Feature extraction and classifier design are keys to symbolic recognition of mathematical formulas. Considering the diversity of the formula, this paper presented a feature extraction algorithm based on multi feature fusion, which includes the feature of hole feature, cross section feature, grid region feature and invariant moment feature. For the best classification results in formula character, we adopted template matching, artificial neural network and SVM. The results show that the method based on the multi feature fusion and SVM can get the higher classification accuracy. In addition, the secondary classifying based on template matching can recognize the similar characters, and improve the character recognition accuracy of formula.
Keywords/Search Tags:Mathematical Formula, Formula Detection, Symbol Segmentation, Character Recognition
PDF Full Text Request
Related items