Font Size: a A A

Chinese Forum Punctuation Extraction And Recognition,

Posted on:2007-03-14Degree:MasterType:Thesis
Country:ChinaCandidate:W XueFull Text:PDF
GTID:2208360185491454Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
The Document layout Analysis is an important technology of the OCR system, which has played more important role on the character recognition and preciseness in OCR system. The punctuation is important part of Document, which decides the structure and region of sentence. It is benefit with Document Segmentation if we can abstract and recognize the punctuation in sentence. But, most researchers who put most attention on the word have ignored it. So, it is very important that this paper pay his attention on the punctuation.The problem which this paper focuses on is how to abstract and recognize the punctuation in sentence. The contents are mainly composed of:(1) This paper study on the pretreatment process of Document layout Analysis. Based on comparing the existed methods, Hough Transform is used to resolve the problem of Skew Correction, Median Filter is used to smooth the Speckle, and a method based on moment threshold selection is used to create the binary image.(2) In the process of Document layout Analysis, this paper has studied the Projection method and Connectivity-Segmentation method. Then, a method based on Dilation which belongs to Mathematical Morphology. This result of Document Analysis by the method is better than the other methods.(3) Based on the result of the pretreatment process and the process of Document layout Analysis, this paper used two methods, which are based on Template Matching and SVM, to recognize the character.
Keywords/Search Tags:Document Analysis, Document Segmentation, Skew Correction, Projection- Segmentation, Connectivity-Segmentation, Character Recognition, Template Matching, Support Vector Machines
PDF Full Text Request
Related items