Entity Linking Model Base On Integrated Training

Posted on:2020-09-03

Degree:Master

Type:Thesis

Country:China

Candidate:B B Liu

Full Text:PDF

GTID:2428330590473260

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the development of the Internet era,people will use many software products in their lives.The number of users of various websites has increased rapidly.Huge amount of Web text information has been generated.There may be some valuable information in the vast amount of Web text information.Different people have different writing styles and different ways of expressing the same meaning.Natural language is also diverse,so that the word polysemy often appears in the text data.When a machine processes a large amount of text data,ambiguity is an obstacle to the machine's understanding of natural language.To solve the problem of word sense disambiguation,many researchers try to use different data sources and algorithms.This paper mainly deals with the phenomenon of polysemy in the text.Wikipedia data is used as a knowledge base.According to ambiguous words and context information,entities are found from the knowledge base to interpret ambiguous words.In order to solve the problem of polysemy,this paper mainly proposes a model that combines various methods of training.In the entity linking method of this paper,the experiment mainly uses the reference word,the mention word context and the mention word document information to generate the representation vector of the mentioned word.Entity representation vector generation mainly uses entity name and entity document information.In the entity linking,the entity linking task is completed by measuring the similarity between the reference word representation vector and the entity representation vector.In this paper,we use the Edit Distance Algorithm and the Vector Space Model when measuring the similarity of text.The experiment used clustering model to cluster text data because the dimensions of the vector space model are too large,the data is sparse.The experiment mainly uses K-means algorithm and Agglomerative Hierarchical Clustering Algorithm.Using the clustering model reduces the dimensions of the reference word representation vector and the entity representation vector and the clustering model improves the experimental model to some extent.Clustering text not only solves the problem of data sparseness,but also obtains text-based category representation to some extent.Finally,this paper uses the method of neural network,and combines the previous methods together,using different granularity of reference and related information.In this paper,we use doc2 vec and word2 vec models to vectorize documents and words,respectively.From the test results of the model,the entity link model canbe improved to a certain extent by using various methods and using different granularity reference word information.

Keywords/Search Tags:

entity link, word sense disambiguation, Wikipedia, K-means clustering, vector space model, doc2vec

PDF Full Text Request

Related items

1	A Study Of Semantic-Disambiguation Approach On Name Entities
2	An Unsupervised Approach To Word Sense Disambiguation Based On Second-order Context
3	Research On Word Sense Disambiguation Based On K-means Cluster And LSTM
4	Research On Word Sense Disambiguation Based On Semi-supervised Model
5	Research And Implementation Of Named Entity Disambiguation Based On Wikipedia
6	Short Text Processing Method Based On Wikipedia
7	Research On Word Sense Disambiguation Based On The Strategy Of Field Priority Selection
8	A Chinese Unsupervised Word Sense Disambiguation Method Based On Semantic Vector
9	Research On Chinese Word Sense Disambiguation Method Based On Graph Model
10	Research On Word Sense Disambiguation Based On DBN