Font Size: a A A

Mathematical Formula Recognition In Typeset Chinese Documents

Posted on:2006-07-09Degree:MasterType:Thesis
Country:ChinaCandidate:T F GaoFull Text:PDF
GTID:2168360155468550Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
With the very rapid increase of internet users in recent years, there is a growing trend of disseminating and exchanging information via this popular channel. Digital library and distance learning are becoming hot research areas that address issues arisen from the widespread use of the Internet. One of the key vehicles in the drive towards realizing these ideas is to develop cheap and efficient methods for transcribing existing knowledge in the form of paper documents into corresponding electronic form, which is the form that can be processed by today's digital computers and transmitted through the Internet. But the widely used commercial OCR systems can not handle scientific documents which contains mathematical expressions(MEs), it is necessary to develop a new OCR system to recognize MEs now.In this paper, I propose an approach for understanding MEs in printed Chinese documents. The system can be divided into two parts namely (i) detection and segmentation MEs in a separated document line,(ii)recognition of symbols in each ME.In this paper, I propose an approach for understanding MEs in printed Chinese documents. The system can be divided into two parts namely (i) detection and segmentation MEs in a separated document line,(ii)recognition of symbols in each ME.At first, a statistical method is proposed to judge whether one text line in a typeset Chinese document contains mathematical formulas or not. The 2ndmoment of the width of symbols in one text line is calculated, The values we got differ greatly between pure text lines and lines contains mathematical formulas. When document lines that contains mathematical formulas are confirmed, mathematical formula symbols can be isolated and labeled according to the morphological differences between them and Chinese characters.Next, the mathematical symbols are standardized and divided into 6 rows and 6 columns of equally size rectangular blocks. Then the numbers of black pixels of each block are calculated to form the 6*6-dimension features and the number of intersections when using the vertical and horizontal line to trisect the symbol equally. Finally an template-based method is used to recognize them.
Keywords/Search Tags:Mathematical Formula Recognition, Symbol labeling, Symbol Recognition
PDF Full Text Request
Related items