Font Size: a A A

Method Research On Printed Mathematical Formula Recognition

Posted on:2017-05-13Degree:MasterType:Thesis
Country:ChinaCandidate:Z Q ZhangFull Text:PDF
GTID:2348330488498682Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
With the popularization of computer and the rapid development of Internet technology, the way to spread and exchange science and technology information through the computer network has become more and more popular. Therefore, it is urgent to turn some books and literatures into electronic format. Now the development of OCR technology has matured, it has been able to efficiently identify the Chinese characters, English characters and numbers in the document image, and it can turn a lot of the paper's of technical materials is converted to an electronic versions. However, it cannot be well recognized some mathematical formula. To properly analyze and identify the mathematical formula, not only need correct segmentation and recognition a single character, but also need accurately analyze the overall structure of a mathematical formula. The mathematical formula structure is complex and mostly exists in the form of two-dimensional structure, and its own characteristics cause great difficulties to determine the logical relationship between the characters, so the formula recognition becomes more challenging.The details are as follows:(1) Location and extraction. In the document the mathematical formula mainly exists in two forms: independent type and embedded type. This paper uses two times localization to improve the positioning accuracy of positioning. There is the positioning and extraction algorithm based on character width center moment complete the first time mathematical formula positioning. There is the positioning and extraction algorithm based on Chinese characters rejection complete the second times of Mathematical formula positioning.(2) Segmentation. The purpose of character segmentation is to segment each character from the formula. In this part, while using the projection method and connected domain segmentation method.(3) Recognition. The traditional template matching method is used to recognize each character.(4) Formula structure analysis and representation. The structure of the formula is analyzed by using the method of feature based character. It uses the EQ Word domain as a formula description tool to convert the formula into a text format.
Keywords/Search Tags:Formula Recognition, Positioning Formula, Symbolic Recognition, Structure Analysis
PDF Full Text Request
Related items