Font Size: a A A

Study Of Chinese Printed Documents Recognition System

Posted on:2008-10-23Degree:MasterType:Thesis
Country:ChinaCandidate:W P LiuFull Text:PDF
GTID:2178360242964347Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
As the fast communicated and exchanged of the information, it is a meaningful task to transform the document stored by paper into the digital form automatically. Thus, it is the most urgent task to exploit a effective document information processing system.The document information processing system we put forward included layout analysis, layout understanding, character recognition, mathematical expression recognition, table processing, layout reconstruction such several modules. This paper pays more attention to layout analysis, printed Chinese character recognition and mathematical expression recognition. And the mathematical expression recognition is our main task.In the layout analysis, we chose a bottom-up algorithm based on nearest neighbor connect-strength and line confidence to segment the image area, table area and text area; In the printed Chinese character recognition module, we calculate the degree of incorporative difference to match the character, also define a refusal class which could orientate the mathematical expression automatically; Put the expression into the formula processing module, then we chose the character segmentation method based on connectivity and the template match method to recognize the character in the mathematical expression and at last we used the structure analysis based on the character to transform the two dimension formula into one dimension Word EQ expression.Through the former processing, the system outputs plain text. The sequence of the Chinese character as well as the Word EQ outputted is the sequence of the original counterpart.The system perfectly picks up the text area; When compare to the other methods of the orientation of the formula, this system is more effective. Through the processing before the structure analysis, it enhances the rate of recognition.
Keywords/Search Tags:document information processing system, layout analysis, printed Chinese character recognition, mathematical expression recognition
PDF Full Text Request
Related items