Font Size: a A A

Classifier Design For Printed Uyghur Word Recognition

Posted on:2020-10-12Degree:MasterType:Thesis
Country:ChinaCandidate:D D LiFull Text:PDF
GTID:2428330602452495Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Optical character recognition is an important branch of pattern recognition.In recent years,the mainstream character recognition technology,such as Chinese characters and English,has developed rapidly,compared with these mainstream character,Uighur recognition technology has started relatively late,and the related research results are few.The study of Uyghur identification technology is of great significance for promoting the development of China's ethnic minority areas,inheriting and carrying forward the fine culture of ethnic minorities and safeguarding national unity.Therefore,Uyghur identification technology is very necessary.The research of Uyghur recognition technology includes two aspects: handwriting and printing.Among them,the research of printed Uighur recognition technology has two methods based on character and whole word.Taking the printed Uyghur word as the research object,this paper studies the key technologies of printed Uyghur word recognition,such as preprocessing,feature extraction,classification and recognition.The specific work is as follows:1.The research background,research significance and research status of printed Uighur recognition technology are introduced in detail,and analyzes the feasibility of Uyghur character recognition based on Uyghur characters,conjoined segments and words.This paper takes words as the research object.Through consulting the two books of 5000 words in Uyghur language and the report on the state of language living in China,46 sets of printed Uyghur word samples have been set up.Each sample contains 4207 categories of words,and 4207 corresponding coding forms of Uighur txt files and Chinese character translation txt files are established.2.We studied the preprocessing and feature extraction process of printed Uighur word recognition.In this paper,two values,denoising,tilt correction,line segmentation,word segmentation,first character segmentation,normalization,and contour are used,the original information of printed Uyghur words is retained to the greatest extent,which ensures the effectiveness of subsequent feature extraction.In terms of feature extraction,we extract the directional line features and HOG features.3.The classification and recognition algorithm of printed Uyghur word recognition is studied.Because of the huge number of words,single classifier is difficult to satisfy the real-time performance of the system,in this paper,cascade classifier is studied.Based on the principle of "turning dictionary",this paper designs the two level cascade classifier of the first level classifier based on the first character recognition and the second level classifier based on word recognition.First,the HOG feature is extracted from the first character image of the word,and sent to the SVM classifier to identify the first character image.The recognition scope of the word is reduced to the sub sample set consisting of all the words whose first character is the first character of the word.Then,the directional line element feature of the word is extracted and sent to Euclidean distance classifier or BP neural network classifier to get the recognition result.Experiments show that the proposed two stage cascade classifier effectively improves the recognition speed of the system.4.The character recognition technology based on deep learning is applied to printed Uyghur word recognition.To improve the classical LeNet-5 model,a 6 layer convolution neural network is used to recognize printed Uighur words.Experiments show that the Uyghur word recognition based on convolutional neural network can achieve better results.
Keywords/Search Tags:Uyghur word recognition, First character segmentation, Classifier cascade, LeNet-5 model
PDF Full Text Request
Related items