Font Size: a A A

System Of Mathematical Formula Recognition In Printed Chinese Documents

Posted on:2008-10-14Degree:MasterType:Thesis
Country:ChinaCandidate:R LiFull Text:PDF
GTID:2178360215459938Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
With the development of science and technology nowadays, mathematical expressions are the core part of most science and technology documents. But it is extremely difficult to express the mathematical formula on computer not only because of its various characters and symbols but also of its changeable layout ways. So the researches on the mathematical expressions, which are composed of many rules of science and technology, can make the mathematical expressions be used in searches, and therefore improve the level of science and technology in literature.The system proposed in this article has the ability to recognize mathematics expressions and to reconstruct the recognized expressions into particular publication format. The system works as follows:Firstly, Image Pretreatment. It will import noise in the process of image creation, and it is hard to deal with such pictures directly, so we need do some job at first to make the picture more appropriate to dispose.Secondly, Mathematical Formula Labeling. In this thesis, a statistical method is proposed to judge whether one text line in a typeset Chinese document contains mathematical formulas or not. The values we get differ greatly between pure text lines and lines contain mathematical formulas. When lines that contain mathematical formulas are confirmed, mathematical formula symbols can be isolated and labeled according to the morphological differences between them and Chinese characters.And then, Symbol Recognition. Formulas we get have many characters, we need to know formula contains, the location relation between characters. We use quick approach to segment each character from the formula and we can recognize the symbol with template marching.Following, Structure Analysis. We analyze the structure of the formula based on characteristic character. We use different algorithm to deal with different characteristic character. We segment the input expression until there is no superscript or subscript in the block. At the end, Output the result. In this part we introduce how to use Word EQ, and finally, we transfer the grammar tree produced by the structure analyzer into a Word EQ document.
Keywords/Search Tags:mathematical expression recognition, structure analysis, formula extraction, symbol labeling, symbol recognition
PDF Full Text Request
Related items