Research Of Repository Based Word Representation Learning Method

Posted on:2019-04-01

Degree:Master

Type:Thesis

Country:China

Candidate:F Y He

Full Text:PDF

GTID:2428330626952394

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Natural language processing is an important research direction in the field of artificial intelligence.Word representation has become one of the research hotspots as a basic tool in natural language processing.The current popular word representations learning appraoch mainly follows the distribution hypothesis.The distributed semantic model based on this hypothesis relies more on the large text corpus,which also imposes limitations on the accuracy of the word representation.On the one hand,research shows that the words trained in the corpus of the specialized field show better performance in some specific tasks,and on the other hand,the larger the corpus,the better the effect of the trained words.So how to solve the loss caused by the increase of the size of the corpus and the miscellaneous fields is the focus and difficulty of this paper.To address the above issue,this paper proposes to use the knowledge base to enhance the ability of word representation.Since the knowledge base provides precise semantic relationships between lexical items,this feature is used to compensate for the problem of domain ambiguity between target word contexts in large corpora.On the one hand,we add the lexical item relationship provided by the knowledge base as a "precise" context to the current popular distributed semantic model.On the other hand,compared with the former knowledge base as a whole to join the training,aiming at the strong and weak relationship between the keywords in the knowledge base,the knowledge base is used again to weight the word representation of the target word,and the quality of the word representation is further improved.Finally,for the multi-word relationship pairs provided by the knowledge base,for the target words with polysemy,a clustering algorithm is added to the multi-word relationship pairs to classify and train the word representation.In the experimental results,for the quality of the trained words,we not only use the current standard test sets,including WordSim353,SimLex999,TOEFL,etc.This paper also builds a new standard test set,IQ-Synonym-323.The performances of the existing model and our proposed model on these datasets not only show that our approach compares favorably to other word representations,but also reflect the application of the new dataset.

Keywords/Search Tags:

Natural Language Processing, Word Representation Learning, Distribution Hypothesis, Knowledge Base, Synonymous Intelligence Test Dataset

PDF Full Text Request

Related items

1	Research On Natural Language Semantic Representation And Reasoning Based On Neural Networks
2	Research On Word-level Ambiguity Resolution Method
3	The Methodology And Implementation Of Chinese Natural Language Query In Databases
4	Joint Learning Methods For Distributed Representations Of Natural Language
5	Knowledge Base Empowered Natural Language Understanding
6	Research On Machine Learning For Natural Language Processing And Transmission
7	Research On Algorithms Of Knowledge Graph Completion Based On Multi-source Information Representation Learning
8	Embodied Properties of Semantic Knowledge Acquired From Natural Language
9	Research On Word Representation Optimization Combining Dependency Relation And Quantification Of Semantic Contribution
10	Research And Implementation Of Key Technologies In Mathematical Natural Language Processing