Font Size: a A A

Research Of The Segmentation-free Manchu Word Recognition

Posted on:2019-01-12Degree:MasterType:Thesis
Country:ChinaCandidate:D HuangFull Text:PDF
GTID:2428330566985055Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
"Manchu" is the language and written character of Manchu people.In the Qing Dynasty,Manchu was promoted and used as an official language,bequeathing a lot of historical Manchu documents.These Manchu documents cover extensive fields,moreover,the content in Manchu documents are not seen in Chinese documents,thus historical Manchu document has important historical research value.Nowadays,Manchu language became nearly extinct;therefore,it is urgent to strengthen the digital protection of historical Manchu documents.With the growing demand for digitization of historical Manchu documents in the library,it is very important to study the technology of optical character recognition of Manchu characters.In previous studies of Manchu recognition,it is usually recognized the characters or primitives after character segmentation.Due to the complexity of the structure of Manchu words,it is not possible to fully realize the correct segmentation of Manchu letters,which restricts the accuracy of subsequent character recognition.And the character reconstruction technology is also need to be solved.Therefore,this paper proposes a recognition method for segmentation-free Manchu word,and the main methods are as follows:1.In this paper,we put forward a directed acyclic graph-support vector machine(DAG-SVM)classifier using polynomial kernel function to classify the segmentation-free Manchu words.The classifier reached recognition rate of 100% in 10-way categories,starting to decline from 20-way classification,and the recognition rate was 90% in 40-way classification,and 78% in 100-way classification.The experimental results show that the segmentation-free Manchu words recognition method is feasible.And for large classes,the DAG-SVM classifier for segmentation-free Manchu word recognition decreased obviously on classification effect.2.In order to improve the recognition rate on a test set with large classes,the convolutional neural network(CNN)is used to classify the segmentation-free Manchu words.Different from the artificial design of shallow feature extractor,the CNN can extract the deep features to improve the recognition rate.Using the CNN to identify and classify 100 classes of segmentation-free Manchu words,and the recognition rate is 99.10%,which is 21.10% higher than the recognition rate of the DAG-SVM,and the recognition rate of 671 segmentation-free Manchu words was 97.89%.The experimental results show that the CNN is applicable to a large number of categories of segmentation-free Manchu words identification and classification.3.In order to enable the classifier to classify segmentation-free Manchu word images of arbitrary size without size normalization.Space pyramid pooling layer is used to replace the last maximum pooling layer of convolutional neural network.This improved CNN;the network can train Manchu word images with arbitrary sizes,so as to avoid normalization which may bring negative effect on recognition rate.The experimental results show that the improved CNN model has a recognition rate of 98.84% for the segmentation-free Manchu words,which is higher than the traditional CNN for the recognition rate of segmentation-free Manchu words.4.In the training of deep neural network,a large amount of experimental data is required.But,Manchu document is scarce and difficult to obtain,the amount of data collected is small,which doesn't offer adequate training data for machine learning models.Therefore,in this paper,the original Manchu image is expanded using the method of data synthesis,the experimental results show that the data set which can be expanded by using the method of data synthesis is applicable to the above three classification methods.
Keywords/Search Tags:Segmentation-free Manchu Word Recognition, Directed Acyclic Graph-Support Vector Machine (DAG-SVM), Convolutional Neural Network (CNN), Spatial Pyramid Pooling (SPP), Data Synthesis
PDF Full Text Request
Related items