Font Size: a A A

The Research About The Preprocessing Part Of The Printed Uyghur Character Recognition System

Posted on:2013-03-02Degree:MasterType:Thesis
Country:ChinaCandidate:X LiFull Text:PDF
GTID:2248330374966408Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Printed character recognition is an important branch of the pattern recognition field. Nowadays,printed character recognition research has been rapidly developed and has achieved great success, and a variety of text-recognition software in different languages has appeared on the market. However, because of the complicated structure of Uighur itself, there are a few researches on Uighur Printed character recognition. Consequently, the connected techniques and software products are not mature enough. The study on Uighur printed character recognition and the development of the software have great influences on the development of the Uighur culture, religion, text material preservation and it’s digitization.The paper mainly focuses on the text image preprocessing. All the research, analysis and experiments are carried out on the Windows XP platform, the Visual C++6.0programme environment. In this study, the text mostly concentrates on the following aspects:1.The analysis of the Uighur text structure, writing, and the characteristics of the phrases, as well as the definition of the the text image baseline. Different from the other languages, Uygur language belongs to the sets of Arabic, so the mutual adhesion between the letters of the inside word makes the segmentation more difficult.2. The preprocessing of the input image. Including:binarization, noise reduction, skew correction, the separation of the main strokes and the subsidiary strokes and so on. Both binarization and skew correction use more conventional methods, for denoising, according to the quality of the text images and Uighur features, we proposed our own methods which are based on the existing methods.3. Feature extraction of the input image. We extracted the ratio of the width, the length of the letters, the subsidiary strokes, the position of letter, the position of subsidiary strokes, the number of holes and the frequency of the strike. We also judged the letter and the subsidiary strokes, which are useful for the improvement of the text recognition.4. The segmentation of the text image, including the line segmentation, word segmentation and letter segmentation, among which the line and the word segmentation are based on the space between the text and the words. The segmentation of the letter adopted the integral projection method, and there are also some of my own methods to determine the cut position.The research on the preprocessing has achieved great success, but there are still many aspects that need improving, for instants, the segmentation methods. In addition, some functions also need further improvement.
Keywords/Search Tags:Text image, Uyghur, Optical Character Recognition(OCR), FeatureExtraction, Segmentation
PDF Full Text Request
Related items