Font Size: a A A

Research On Word Meaning Change Based On Deep Learning

Posted on:2022-07-18Degree:MasterType:Thesis
Country:ChinaCandidate:J J XiongFull Text:PDF
GTID:2558306347451044Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Since the 21st century,Chinese economy,science and technology,politics,culture and other fields have been changing with each passing day,and people’s ideology,values orientation and lifestyle have also changed.As a product of society,language reflects the development of society.Among the three elements of language,vocabulary is the one that changes most rapidly with the development of society,and it can best reflect the development of society and social phenomena.There are four main forms of vocabulary changes:the emergence of new word meaning,the disappearance of old word meaning,the emergence of new words and the disappearance of old words.The change of word meaning is an inevitable problem in intelligent solution of Chinese language,which brings difficulties to the understanding of intelligent solution.However,the recognition technology of word meaning change can help the intelligent solution system to better disambiguation and understand the question.In the cutting-edge intelligent Chinese education solutions,deep learning pre-training language model is mostly used,but deep learning pre-training language model only has a basic word vector embedded representation,which cannot accurately represent the word vectors in different time periods.Therefore,if the embedding representation can be extended according to the time period,the model will get more accurate word vector information,so as to improve the performance of downstream tasks.Therefore,it is of great research value in the field of intelligent education to get different word scales according to the time period.Based on the diachronic corpus People’s Daily(1946-2020),this paper conducts a research on word meaning changes from the perspective of deep learning.The specific contents are as follows:(1)To solve the problem that the traditional Skip-gram model based on method of word meaning change recognition can’t compare the word vectors that span more than a period well,this paper proposes an improved skip-gram based method of word sense change recognition.The traditional Skip-gram model based on method of word meaning change recognition relies too much on the word vectors of the previous period,which affects the comparison of two groups of word vectors spanning more than one period.In view of the shortcomings of traditional method,this paper proposes to use the corpus of all time periods for training,and use the word vector obtained as the initialization parameter of the word vector model of each time period,and then adjust the word vector model with the corpus of the corresponding time period for each time period.In order to verify the validity of the improved model,this paper use Fasttext to classify samples,each sample input is on the basis of sample years to find the corresponding word vector table and the input sample are transformered into word,The input of each sample will find the corresponding word vector table according to the year of the sample,and the input sample will be converted into a word vector for multi-classification tasks.Using the word vector set provided in this article to initialize Fasttext’s average classification accuracy is 1.6%higher than using traditional methods,which verifies the effectiveness of the algorithm proposed in this paper.(2)Aiming at the missing or irregular domain labels of most of the data in the People’s Daily corpus,which makes it impossible to analyze the domain changes of words with word meaning changes,this paper proposes the People’s Daily domain text classifier BERT-wwm+CNN to perform data Expansion,so as to analyze the domain change of words whose meaning changes.By comparing the BERT-wwm+CNN text classifier proposed in this paper with the typical models in text classification field,it is found through performance evaluation that the text classifier combining the BERT-wwm model and CNN layer has the best effect,and the F1-score value reaches 93.33%in the field classification on the data set of People’s Daily.Therefore,the trained BERT-wwm+CNN was used as a classifier to label 1,440,354 samples without labels or irregular labels,so as to expand the data set.By counting the number of articles in each field in each time period of the corpus of People’s Daily,this paper analyzes whether the field change,the time period of the change and the type of the change.
Keywords/Search Tags:word meaning change, word vector, text classification, Deep learning
PDF Full Text Request
Related items