| The digitization research of Tangut characters is one of the important contents of protecting Tangut characters.In recent years,with the development of deep learning,many Tangut character recognition systems have been constructed,which can achieve automatic recognition of Tangut characters and solve some of the difficulties in Tangut character recognition.However,because the structure of Tangut characters is very complex,the strokes are tedious,and the similarity is also high,it is difficult for researchers to annotate Tangut character data sets,which will consume more time and energy to annotate similar characters,and the recognition accuracy in supervised learning is also low.Meanwhile,due to the long history of Xi Xia literature and incomplete preservation,the dataset of Tangut characters contains a large number of incomplete text images,resulting in low recognition accuracy of Tangut characters in incomplete text.In response to the above issues,this thesis conducted research on unsupervised recognition algorithms for Tangut characters.The specific content is as follows:(1)To address the issues of difficulty in labeling and low accuracy in recognizing similar characters in the Tangut character dataset,an Unsupervised Two View Comparative Model(UTVCM)based Tangut character recognition algorithm is proposed.Specifically,firstly,a dual view comparison structure for Xixia text images is proposed,which forms a positive sample pair of the original input image by changing the color channel of the image itself,while the remaining input images are used as negative sample pairs.The feature structure of the data samples is utilized,eliminating the process of introducing labels to establish positive and negative sample pairs;Secondly,based on the unsupervised double view comparison model,a variable perceptron is added to the feature extractor to form a dynamic feature extractor,which makes the extracted text local structure features more suitable for the text itself;Then,in order to make the positive sample pair representation of the model closer in the feature space and the distance between positive and negative samples farther,the comparison loss function is improved;Finally,the unsupervised dual view comparison method was used to train the Tangut character recognition model and its recognition performance was verified.The experimental results show that compared with other supervised learning and unsupervised learning,UTVCM achieves a higher recognition accuracy rate,and because variable perceptron is added to the network,UTVCM algorithm also achieves a higher recognition accuracy rate in the recognition of similar characters.(2)A novel unsupervised Transformer based Tangut character recognition algorithm is proposed to address the accuracy issue of incomplete Tangut character recognition.First,we use the progressive shrinking transformer as the backbone network and increase the attention mechanism,which integrates the global features and local features of the text,and enhances the feature expression ability of the model;Secondly,raw images and masked images are used as inputs in network training to improve the recognition accuracy of incomplete text;Then,different design methods of multi-layer perceptron are compared experimentally;Finally,the unsupervised Transformer method was used to train the Tangut character recognition model and verify its performance.The experimental results show that the recognition accuracy of this model on the Tangut character dataset exceeds that of most unsupervised recognition models on this dataset,while also improving the recognition accuracy of incomplete characters.(3)Based on the above two algorithms,a Tangut character recognition system was designed.Implemented model selection and result display,and applied the proposed model and trained model parameters to the recognition system,which can help researchers quickly recognize text and improve recognition accuracy,promoting the study of Xixia history. |