Font Size: a A A

Segmentation Of Text Lines In Off-Line Chinese Handwritten Character Recognition

Posted on:2008-10-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhouFull Text:PDF
GTID:2178360245498178Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Chinese character is the crystallization of Chinese nation for thousands of years and has been playing a very important role in the development of our society. It is of great importance to digitalize the handwritten documents. Optical Character Recognition (OCR) is a fast and automatic input for character. Character recognition contains on-line and off-line character recognition, and we focus on the off-line Chinese character recognition. Chinese character recognition is very different from English character recognition, and less developed than it. Seen from the development of English character recognition, Chinese character recognition should transfer from single character recognition to sentence recognition, and there exists many problems.Databases for Chinese character recognition are all single character and don't support sentence recognition. We establish a Chinese character database called HIT-MW to research segmentation-free recognition. Page is the basic unit of HIT-MW, and should be segmented into lines before recognition. And that is what we will do in this paper.Firstly, we use global horizontal projection, small angle skew correction, second global horizontal projection, partial horizontal projection to acquire the multiple lines set. Then we fully analyze the set and divide it into four subsets. We treat these subsets with different methods.Then we skeleton the text and may acquire a few candidate segmentation paths. For the subsets we can't get the candidate paths, we thin the strokes and get characteristic points and combine them with the skeleton image to search for the candidate paths.Finally, after we get all the candidate paths, we try many evaluation functions and choose the Gaussian mixture function. We get 62.64% right rate of the whole multiple lines set.
Keywords/Search Tags:handwritten Chinese recognition, segmentation of text lines, evaluation function
PDF Full Text Request
Related items