Font Size: a A A

A Research Of Online Handwritten Mathematical Formula's Segmentation And Recognition Algorithms

Posted on:2017-08-08Degree:MasterType:Thesis
Country:ChinaCandidate:H Y ZhangFull Text:PDF
GTID:2348330485988477Subject:Computer science and theory
Abstract/Summary:PDF Full Text Request
Pattern recognition's process has indirectly promoted the development of online-intelligence educational system. In the application of online intelligent scoring in mathematics education, since points are often given by writing the correct mathematical formulas, identifying students' handwriting accurately and effectively is the key issue to be addressed in this article.Binding the feature that online system can transmit answers' pst file corresponding to each question as the identification source file, the project has studied a large number of machine learning algorithms to design the identification system. The study found that words' segmentation quality directly affects the accuracy of subsequent character's recognition. Furthermore, the fewer words that the single-character-identifying CNNs needs to identify the more accurate of the result. Therefore, enhancing the accuracy of word segmentation and separate as many mathematical symbols in the handwriting as possible becomes a way to reduce the overall error in identifying characters in the whole. So the segmentation algorithm system neened has the following two additional functions:First: separate descriptive language and mathematical formulas on an answer sheet;Second: re-separate after language separation.In order to meet new functional requirements, following two different ideas, the project puts forward two segmentation schemes:The first method is to quantify the differences between Chinese character strokes and mathematical symbols, then use pre-established rules to group strokes. Next, score those combinations via identification system and use the Viterbi algorithm to select the best combination of ideological sequences as combined result. Scheme for Chinese characters and mathematical symbols properties has two implementations, which are excluding strokes in accordance with the mathematical symbols characteristic of stroke and extract strokes in accordance with the characteristics of Chinese character.And the second method is built on features learning, which is based on macroscopic impression of separating characters and mathematical formulas in answers, and trains convolution network for features extract to encode under unsupervised environment. After training, the network is used for extracting features in mathematical formulas and Chinese characters and those features are used for a SVM training. The program first employ semi-automatically machine learning methods to divide fields, and then uses a common segment segmentation method for different regions to do word segmentation.Based on the ensurance of segmentation's high accuracy, recognition optimization study focused on how to implement and improve CNNs structure and training methods and thus enhance word recognition accuracy.Experimental results show, respectively, for commonly used Chinese characters and mathematical symbols, training the recognition engine correspondingly helps reduce recognition error rates between similar symbols, and thus improves the overall efficiency of identification. Using size judgment and the combined algorithms learned by machines to pre-segment Chinese and Englishin is viable. After improving performance and debugging, this method can achieve and enhance the accuracy of textualization of handwritten mathematical formula.
Keywords/Search Tags:pattern recognition, mathematical language, segmentation of paragraph, machine learning, Features Extraction
PDF Full Text Request
Related items