Font Size: a A A

Research On The Anaphoric Resolution Of Personal Pronouns In Classical Books Based On Deep Learning

Posted on:2021-10-24Degree:MasterType:Thesis
Country:ChinaCandidate:S ChenFull Text:PDF
GTID:2518306608461644Subject:Books intelligence
Abstract/Summary:PDF Full Text Request
There are various kinds of precious ancient Chinese classics in the long history of Chinese culture.The text of classic books contains rich historical information and records the outstanding philosophy of predecessors.It lays the foundation of national culture and is crucial to the promotion and inheritance of traditional culture.With the development of the information age,it is of great significance to make deep exploration and knowledge discovery of ancient Chinese classics by using the information processing technology of ancient Chinese classics,which is an important carrier of national culture--ancient Chinese classics.It is not only conducive to the development and inheritance of traditional culture,but also conducive to the promotion of national cultural soft power.Personal pronoun is a pronoun that refers to a person entity in natural language,and acomplete referential relation is composed of "reference language" used to refer to the pronoun and the referent content,that is,"antecedent language".Although the personal pronouns in ancient Chinese classics have the same function as that in modern Chinese,there are many differences between ancient Chinese and modern Chinese in grammar,words and so on.Therefore,the correct identification of personal pronouns in ancient Chinese plays an important role in the in-depth study of ancient Chinese classics.In this paper,the problem of reference resolution in ancient Chinese classics is discussed in depth,the methods of personal pronoun recognition and reference resolution are compared and studied by using traditional machine learning and deep learning methods.This paper focuses on the following three points:(1)Construct a anaphoric resolution of personal pronoun corpus.Based on electronic the historical records as the corpora,part-of-speech tagging set marked by Nanjing agricultural university,after having the analysis of the characteristics of ancient Chinese personal pronouns refer to relation with modified and set according to the labeling of defects,this paper establishes the anaphoric resolution of corpus annotation specifications and forms the required personal pronouns which refers to the anaphoric resolution of tests under corpus study.This corpus is based on ancient Chinese and is rich in personal information and intra-sentence referential relations,which can meet the needs of the experiment in this paper.(2)Carry out the personal pronoun recognition experiment of classic books by CRF model based on traditional machine learning method and BERT model based on deep learning method,and this experiment lays a foundation for the research of personal pronoun anaphora resolution based on deep learning method in the following chapters.Firstly,the CRF model framework is introduced.Secondly,the feature selection and feature template are introduced.The BERT model based on deep learning is introduced again,and it is used for the training of non-part of speech word unit corpus.Finally,the experimental results are compared and evaluated.The experimental results show that the introduction of part of speech in CRF experiment and the use of word units for personal pronoun recognition have the best effect,with the average value of F reaching 91.83%.In the case of the same segmentation of non-part of speech word units,the recognition effect of BERT deep learning model is better than that of CRF model,with the average value of F reaching 90.85%.It is also applicable to personal pronoun recognition of small-scale corpus.(3)Use Bi-LSTM-CRF model and BERT model to resolve anaphoric resolution of personal pronouns.Firstly,combining Word Embedding to obtain the deep implicit semantic features in the Bi-LSTM-CRF experiment,and carrying out four experiments so as to form the three groups of control experiments,one is in the non-speech corpus,use the word-based corpus and the chars-based corpus for comparative experiments.The second is to conduct experiments on the corpus of non-speech word units,and at the same time increase the attention mechanism in the experiment,and the third group control experiment is to give the word unit increases the processing features,with no part of speech of words before unit corpus experimental comparison,to explore the part of speech characteristics of pronouns refer to eliminate the effect.The results show that the experimental results of word unit corpus are better than that of word unit corpus in the case of no part of speech.The addition of Attention mechanism improves the resolution of reference.The addition of part of speech features can greatly improve the resolution effect of the model.Secondly,according to the training corpus,BERT model was adjusted to the best experimental parameters for reference resolution experiment.After ten-fold cross verification,the average f-resolution effect was 82.43%.Finally,visual analysis was carried out on the resolution results of each experiment.The results showed that the part of speech feature was introduced into the bi-lstm-crf model,and the segmentation method of word unit was adopted to achieve the best experimental resolution effect,with the average value of F up to 84.00%.
Keywords/Search Tags:Anaphora resolution, Personal pronouns, Classics, The ancient Chinese
PDF Full Text Request
Related items