Font Size: a A A

Recognition Of Printed Uyghur Words Based On Segmentation

Posted on:2016-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:X LangFull Text:PDF
GTID:2348330488974149Subject:Traffic Information Engineering & Control
Abstract/Summary:PDF Full Text Request
Character recognition is a branch of pattern recognition applications. Research on Uyghur recognition has important significance to promote the development of multi-national information technology in our country and uphold ethnic unity. Research on Uyghur recognition starts late compared with the Chinese characters, English and Japanese. The Uyghur recognition includes printed and handwritten, recognition of Uyghur words can be classified based on the segmentation and the whole word. This paper studies of printed Uyghur words recognition based on segmentation: firstly segment the Uyghur word into characters, then recognize these characters, finally manipulate the recognition result with post-processing. The specific research contents are as follows:1. Introduces the research background and significance of Uyghur OCR recognition. In addition, analyzes the features of printed Uyghur characters and the difficulties of printed Uyghur word recognition.2. Sets up the database of printed Uyghur characters. The database is built by collecting 11 sets of size at the beginning of the commonly used printed Uyghur, in which each font contains 128 Uyghur characters, and a total of 1408 samples in this database. This database is an important basis for the research work of this paper.3. It is a difficult problem in the recognition of Uyghur OCR to segment word into characters. To solve this difficult problem of Uyghur character segmentation, this paper proposes an improved segmentation algorithm in which the connected components labeling combined with vertical projection, and the experimental results show that this algorithm can effectively avoid the distortion of Uyghur characters segmentation. In addition, this paper also presents an improved algorithm to calibrate the baseline of word.4. In the recognition of printed Uyghur character, this paper combines connected components marked denoising with mathematic morphological filtering in the preprocessing of Uyghur words and characters. Experimental results show that this method can effectively remove isolated noise points and eliminate break of Uighur words. In addition, according to many similar characters in Uyghur characters, this paper extracts directional line element feature and gradient feature, then classify these features by euclidean distance classifier. The experimental results show that the average recognition rate of the first candidate can reach 91.26%. Finally, this paper deals with the result of recognition by using HMM, and the experiment proves its validity.5. This paper develops Android-based Uyghur handwriting input method software. The JNI technology is used to realize Uyghur handwriting recognition system on pc porting to Android phone, and the function that two input modes can be switched from each other is made on this basis.
Keywords/Search Tags:Uyghur Segmentation, Printed Uyghur Recognition, HMM, Uyghur Feature Extraction
PDF Full Text Request
Related items