Font Size: a A A

Automatic Detection And Analysis Of Chinese Lexical Changes Based On Diachronic Corpora

Posted on:2019-11-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhengFull Text:PDF
GTID:2428330545953842Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The development of language in the Internet age is even more rapid,which brings challenges to natural language processing and linguistics.Since word is the basic units of the language,the analysis of word changes is also the basic content of the study of historical linguistics and lexical semantics.The acquisition of large-scale real texts and the development of deep learning techniques have made it possible to study lexical changes based on large time scales.This thesis studies the recognition and analysis of Chinese lexical changes based on diachronic corpus,which not only contributes to the study of language ontology,but also contributes to the application of natural language processing and lexicography.This thesis studies word frequency change and meaning change.The specific content is as follows:(1)Study on word frequency change based on word stability.Firstly,we compare the measurement indexes of the stability of words,investigate the diachronic changes of word frequency with different stability,and then find the time background of the appearance of words.The word stability is applied to the extraction of the core words of the Tongyici Cilin and the vocabulary updating of the vocabulary syllabus for HSK.(2)Study on word meaning change based on traditional word vectors.The word vector is an effective method of word meaning representation,and it is trained separately for each time period.Similarity can't be calculated for word vectors in different time periods due to different word sense spaces.And it is solved by intersection with similar words.The word similarity change curve and the distribution of neighboring words are used to identify the time of word meaning change and analyze the reason of change.(3)Study on word meaning change based on diachronic word vectors.Considering the diachronic influence of the word vector,the diachronic vector is proposed,and the word vector of the adjacent time period is initialized by using the word vector training result of a time period.The results show that the word meaning representation of the diachronic vector is more reasonable than the traditional word vector.Using cosine distance to calculate the change of word similarity and the change of neighboring words to identify the time of word meaning change and analyze the change trend.This thesis applies the method to the analysis of the formation and development of the metaphorical meaning of words,and combines clustering algorithm to help generate the cross-domain mapping of conceptual metaphors.
Keywords/Search Tags:Diachronic corpus, Word frequency change, Word stability, Word sense change, Word embedding, Metaphor recognition
PDF Full Text Request
Related items