Font Size: a A A

Research On Printed Chinese Documents Recognition System

Posted on:2010-12-24Degree:MasterType:Thesis
Country:ChinaCandidate:H ChenFull Text:PDF
GTID:2178360275978671Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
With the arrival of the information age,contents of the information are more and more complicated.The Chinese-document information is not only contained Chinese,but also contained English,kinds of formulas,tables and images.To input the details of the information into computer quickly and efficiently is a key task in information processing.Chinese character recognition system is the product for recognition inputting and format Changing.Yet,operating recognition system can not realize formulas recognition.So,it has important practical value and theory meaning for designing the Chinese document recognition system contained formula character recognition function.Based on researching the current character recognition theories and related technologies,the Printed Chinese document Recognition System that contained formula character recognition is researched and designed in this paper.Layout analysis,Chinese character recognition and formula character recognition are mainly contained in this system.The works as following:Firstly,after the processing of the original document image,we use based on multi-level credibility and projection features,realizing the layout analysis function.Then the details of the document are separated to several parts, containing text part,table part and image part.The text part contains Chinese and formula characters,which entered into multi-character recognition function.Secondly,we use two-level position method to locate formulas.The result of Chinese character recognition function is used to the locating basis.We use connected domain to segment single formula character.We use hole-features,grid features and thread features of 567 characters to set up a feature library of formulas,and use multi-level classifiers to recognize a single character and make sure its property.Then we use the property realize the structure analysis of formulas.We improve the method of based on characteristic character,and expand and redesign 7 sub-algorithms of structure analysis,and use 10 Word EQ syntaxes to generate the only one-dimensional string of formulas.Last,based on kinds of algorithm the Chinese document recognition system named MYOCR is integrated in this paper.The result of the given example is verified effectiveness of the algorithms and superiority of MYOCR.And the recognition rate of MYOCR is satisfied.All above are laid a foundation for further researches.
Keywords/Search Tags:layout analysis, Chinese character recognition, formula character recognition, structure analysis
PDF Full Text Request
Related items