| Chinese classics and documents bear witness to the splendid Chinese culture of thousands of years.They embody the wisdom of countless generations of Chinese people,and contain a great deal of knowledge waiting to be tapped.In recent years,with the rise and development of digital humanities research,scholars began to use the new research paradigm of digital humanities to explore the knowledge in ancient books.At present,in the field of digital humanities,the research on lexical level computation of ancient Chinese is relatively mature,but the research on semantic level is still in its infancy.The construction of semantic relation resources is the basis of semantic research,and synonymy is an important part of semantic relation.At present,few scholars have carried out researches on the automatic construction of ancient Chinese synonyms.Therefore,this study focuses on synonyms in the field of ancient Chinese,aiming to propose a method to automatically extract synonyms based on the full text information of ancient Chinese books,and construct a synonym dictionary of ancient Chinese,so as to provide resource support for the mining and utilization of ancient Chinese books.A large number of synonyms in ancient Chinese provide the basis for this study.Most of the existing studies have explored synonym extraction methods based on the anticipation of a single language.This paper makes full use of the original texts of ancient books and their corresponding translations,takes the full text of the first four histories and their corresponding translation corpus as the experimental objects,and carries out research on ancient Chinese synonym extraction,synonyms construction and dictionary application,including the following research contents:1)Synonym extraction algorithm based on word alignmentThe research on synonym extraction algorithm based on word alignment aims to achieve unsupervised synonym extraction from the corpus of the first four histories.Specifically,it includes the following four parts: The first part constructs the corpus of the first four histories synonym extraction to provide data support for the extraction algorithm;In the second part,the synonym extraction algorithm of modern Chinese and English is transferred to ancient Chinese.In this study,the most common Word2 vec algorithm is used as baseline algorithm.Compared with other algorithms proposed in this paper,the results show that this algorithm has poor applicability in ancient Chinese field.In the third part,the IBM algorithm implemented by the Fast-align tool is used to construct the first four history sentence alignment corpus,which provides corpus support for synonym extraction algorithm.In the fourth part,synonyms were extracted unsupervised based on the hypothesis that "two words translated as the same word may have a synonymy relationship".A total of 16,272 groups of results were obtained,and the extraction accuracy was 40.12%.The extracted synonym cluster and the first four history word alignment corpus will provide data support for the subsequent supervised learning to achieve further optimization.2)Optimization of synonym extraction based on Siamese networkBased on the excellent effect of Siamese network in judging similarity,this part adopts Siamese network structure to construct a judgment model of synonymy in ancient Chinese,and converts the problem of word relationship into the dichotomy of whether word pairs are synonyms.In the supervised method,the synonym cluster after proofreading was used to construct the parameters of the data set training model to realize the second judgment of whether the word is synonymous,that is,the optimization of the original synonym cluster.The accuracy of the synonym after optimization reached84.21%.The synonymy judgment model will be used to construct the final synonym dictionary of ancient Chinese,which,as the result of this study,can be applied to other digital humanities studies.3)Application of ancient Chinese synonymsThis part studies the application of the ancient Chinese synonyms.In this part,the author designs the classification task of the characters’ relationships in the classical books,and discusses whether the model effect can be improved by constructing the ancient Chinese synonym dictionary with the help of Records of the Historian corpus marking the characters’ relationships.The experimental results show that compared with the model without synonyms,the accuracy of the model with synonyms improves by 1.33%,and the partial accuracy increases by 1.03%,which verifies the quality and value of the ancient Chinese dictionary constructed in this study in the fields of natural language processing and digital humanities. |