Font Size: a A A

Research And Implementation Of Characters Detection And Recognition In Ancient Yi Books

Posted on:2021-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:X HanFull Text:PDF
GTID:2428330611462831Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Text is the soul of national culture and an important carrier of knowledge inheritance.It plays an important role in people's daily communication.Therefore,research on text processing is also an important part of computer research.Optical character detection and recognition has a broad application prospect,including text detection and recognition in natural scenes,main information extraction of bills,character recognition in ancient books and so on.At present,there are a lot of research on text detection and recognition of ancient books,in foreign countries,mainly on ancient Latin manuscripts.In China,there are also many studies on ancient Chinese buddhist scriptures,ancient books in uyghur and Mongolian.China has a long history of civilization and numerous ethnic groups.In addition to those mentioned above,there are still many ethnic groups whose ancient books have not been digitized.As the sixth largest minority nationality in China,the Yi nationality has its own unique culture.The number of users of the Yi language is more than one million.Therefore,the character detection and recognition of ancient Yi books is of great significance to the advancement of the digitization of ancient Yi books.Aiming at the ancient Yi characters in Guizhou region,this thesis constructs a sample library of ancient Yi handwriting which can be used for character recognition of ancient Yi books,proposes a method of character detection and recognition in ancient Yi books,designs and achieves an automatic recognition system for ancient Yi books.The specific work of this thesis is as follows:(1)With reference to the Concise Yi-Han Dictionary(Guizhou Version)and the General Yi-Chinese Dictionary,3786 commonly used characters in ancient Yi were compiled and the corresponding traditional Yi font was designed.The characters in the font file are generated into a sampling table,and the corresponding ancient Yi handwriting samples are automatically extracted and labeled by techniques such as background filling,tilt correction,and area extraction.The sample is incremented by morphological transformations such as corrosion,dilation,affine transformation,rotation,etc,and automatic expansion of the sample library is realized.The storage format of the data set is designed with reference to the MNIST data set.(2)Aiming at the Yi Scriptures with simple layout structure and low noise,the method of connected area analysis and regression character segmentation have achieved good detection results.For the ancient Yi books with complex layouts,non-normative typesetting,and mixed text and graphics,we proposed a character detection method of ancient Yi books based on maximally stable extremal regions(MSER)and convolutional neural network(CNN).Firstly,we preprocessed the scanned images of ancient Yi books with non-local mean filtering.Secondly,we obtained the binary image result by adopting an improved method of local adaptive binarization.Then,non-text areas were removed by adopting the method based on heuristic rules.Finally,a combining method of MSER and CNN is used to detect single character.The experimental results show that the proposed approach effectively separates the text and non-text areas,achieves high accuracy and recall rate in single character detection experiments,and effectively solves the problem of character detection in identification of ancient books.(3)In order to better train the ancient Yi single character recognition model,a semi-automatic sample generation method is proposed in this thesis,which expands the sample size.By analyzing the influence of super parameters such as the number of convolutional layers,the number of convolution kernels,the learning rate and other parameters on the performance of the convolutional neural network,a convolutional neural network structure based on the inception structure is proposed,and the convolution based on the cosine similarity is adopted instead of traditional convolution,the experiment proves that the final recognition model can effectively extract the features of the samples with uneven illumination,has strong robustness.The recognition accuracy of the ancient Yi handwritten character set constructed reached 98.62%.(4)The automatic recognition system of ancient Yi books is designed and achieved,which encapsulates the detection and recognition model,and the system can automatically recognize the characters in any area selected from the image of ancient Yi books and output it on the output interface.
Keywords/Search Tags:ancient Yi handwriting sample library, maximally stable extremal regions, convolutional neural network, the automatic recognition system of ancient Yi books
PDF Full Text Request
Related items