The Research About The Preprocessing Part Of The Printed Uyghur Character Recognition System

Posted on:2013-03-02

Degree:Master

Type:Thesis

Country:China

Candidate:X Li

Full Text:PDF

GTID:2248330374966408

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Printed character recognition is an important branch of the pattern recognition field. Nowadays,printed character recognition research has been rapidly developed and has achieved great success, and a variety of text-recognition software in different languages has appeared on the market. However, because of the complicated structure of Uighur itself, there are a few researches on Uighur Printed character recognition. Consequently, the connected techniques and software products are not mature enough. The study on Uighur printed character recognition and the development of the software have great influences on the development of the Uighur culture, religion, text material preservation and itâ€™s digitization.The paper mainly focuses on the text image preprocessing. All the research, analysis and experiments are carried out on the Windows XP platform, the Visual C++6.0programme environment. In this study, the text mostly concentrates on the following aspects:1.The analysis of the Uighur text structure, writing, and the characteristics of the phrases, as well as the definition of the the text image baseline. Different from the other languages, Uygur language belongs to the sets of Arabic, so the mutual adhesion between the letters of the inside word makes the segmentation more difficult.2. The preprocessing of the input image. Including:binarization, noise reduction, skew correction, the separation of the main strokes and the subsidiary strokes and so on. Both binarization and skew correction use more conventional methods, for denoising, according to the quality of the text images and Uighur features, we proposed our own methods which are based on the existing methods.3. Feature extraction of the input image. We extracted the ratio of the width, the length of the letters, the subsidiary strokes, the position of letter, the position of subsidiary strokes, the number of holes and the frequency of the strike. We also judged the letter and the subsidiary strokes, which are useful for the improvement of the text recognition.4. The segmentation of the text image, including the line segmentation, word segmentation and letter segmentation, among which the line and the word segmentation are based on the space between the text and the words. The segmentation of the letter adopted the integral projection method, and there are also some of my own methods to determine the cut position.The research on the preprocessing has achieved great success, but there are still many aspects that need improving, for instants, the segmentation methods. In addition, some functions also need further improvement.

Keywords/Search Tags:

Text image, Uyghur, Optical Character Recognition(OCR), FeatureExtraction, Segmentation

PDF Full Text Request

Related items

1	Research On Uyghur Text Recognition In The Scene Image
2	Research And Application Of Optical Character Recognition For China Second-generation Identity Card
3	Video Text Extraction Technology Research And Application
4	Study On Key Techniques Of Uyghur Character Recognition
5	Online Handwritten Uyghur Character And Word Recognition Based On The Platform
6	The Text Recognition And Translation System Based On Android Platform
7	Research On Uyghur Recognition Technology Based On Word Part
8	Application Of Text Detection Based On Semantic Segmentation In Receipt Optical Character Recognition
9	Research And Implementation Of The Optical Character Recognition Based On Android Platform
10	The Research Of Laser Engraving Metal Detonator Character Image Recognition Method And Application