Font Size: a A A

Research Of Repository Based Word Representation Learning Method

Posted on:2019-04-01Degree:MasterType:Thesis
Country:ChinaCandidate:F Y HeFull Text:PDF
GTID:2428330626952394Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Natural language processing is an important research direction in the field of artificial intelligence.Word representation has become one of the research hotspots as a basic tool in natural language processing.The current popular word representations learning appraoch mainly follows the distribution hypothesis.The distributed semantic model based on this hypothesis relies more on the large text corpus,which also imposes limitations on the accuracy of the word representation.On the one hand,research shows that the words trained in the corpus of the specialized field show better performance in some specific tasks,and on the other hand,the larger the corpus,the better the effect of the trained words.So how to solve the loss caused by the increase of the size of the corpus and the miscellaneous fields is the focus and difficulty of this paper.To address the above issue,this paper proposes to use the knowledge base to enhance the ability of word representation.Since the knowledge base provides precise semantic relationships between lexical items,this feature is used to compensate for the problem of domain ambiguity between target word contexts in large corpora.On the one hand,we add the lexical item relationship provided by the knowledge base as a "precise" context to the current popular distributed semantic model.On the other hand,compared with the former knowledge base as a whole to join the training,aiming at the strong and weak relationship between the keywords in the knowledge base,the knowledge base is used again to weight the word representation of the target word,and the quality of the word representation is further improved.Finally,for the multi-word relationship pairs provided by the knowledge base,for the target words with polysemy,a clustering algorithm is added to the multi-word relationship pairs to classify and train the word representation.In the experimental results,for the quality of the trained words,we not only use the current standard test sets,including WordSim353,SimLex999,TOEFL,etc.This paper also builds a new standard test set,IQ-Synonym-323.The performances of the existing model and our proposed model on these datasets not only show that our approach compares favorably to other word representations,but also reflect the application of the new dataset.
Keywords/Search Tags:Natural Language Processing, Word Representation Learning, Distribution Hypothesis, Knowledge Base, Synonymous Intelligence Test Dataset
PDF Full Text Request
Related items