Font Size: a A A

Research On Text Recognition Of Tangut Ancient Books With Unbalanced Samples

Posted on:2024-07-30Degree:MasterType:Thesis
Country:ChinaCandidate:T H WangFull Text:PDF
GTID:2555306926475354Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The ancient books of the Tangut Dynasty are the historical records of the Tangut dynasty,which are of great value to the study of the national culture of the Tangut Dynasty.In order to protect and study the culture of the Tangut dynasty more effectively,identifying and translating the Tangut Dynasty characters is the key to promote the study of Tangut Dynasty.However,at present,there are still some problems in the recognition of Tangut characters,such as high difficulty in collecting the data of ancient books of Tangut dynasty,unbalanced category of data set,cumbersome text structure and font illegibility.In order to solve the above problems,based on the data amplification,this paper uses convolutional neural network to study the text image recognition of Tangut Dynasty.The main contents are as follows:(1)In order to solve the problem of unbalanced text image data of ancient books of the Western Xia Dynasty,this paper proposes a data enhancement method of text image of ancient books of the Western Xia Dynasty based on the public character data set of ancient books of the Western Xia Dynasty.This method combines the traditional data enhancement method with the method of generating adversarial network to generate new samples to expand the sample size of the data set.Besides retaining the original features,new features are endowed to improve the diversity of the image.The experimental results show that this method can effectively solve the problem of the unbalance of the ancient texts of the Tangut Dynasty,improve the recognition effect of the model,and thus improve the recognition accuracy of the Tangut Dynasty.(2)In terms of model selection,this paper compares the recognition and classification accuracy of five pre-training models of convolutional neural networks on different data sets,and finally chooses DenseNet network as the main model for Txixia recognition.In addition,in order to better solve the problem that the DenseNet network recognition model may be affected by category imbalance in the training process,resulting in the model’s low recognition accuracy for a small number of categories,this paper proposes a Tasiatic recognition model based on Densenet-FL-RC3 network.By adding the loss function of difficult-to-classify samples and randomly clipping branches of feature graphs,the network pays more attention to those difficult-to-classify samples and can select more features at the same time,thus reducing the influence of sample imbalance on the training of recognition model and improving the recognition accuracy.This study is of great significance for the protection and reuse of ancient books in the Tangut dynasty,and also provides useful ideas and methods for the digital application of ancient books in the Tangut Dynasty.
Keywords/Search Tags:Tangut ancient books, Character recognition, DenseNet network, Loss function, Random clipping branch of characteristic graph
PDF Full Text Request
Related items