Font Size: a A A

Entity Linking Model Base On Integrated Training

Posted on:2020-09-03Degree:MasterType:Thesis
Country:ChinaCandidate:B B LiuFull Text:PDF
GTID:2428330590473260Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of the Internet era,people will use many software products in their lives.The number of users of various websites has increased rapidly.Huge amount of Web text information has been generated.There may be some valuable information in the vast amount of Web text information.Different people have different writing styles and different ways of expressing the same meaning.Natural language is also diverse,so that the word polysemy often appears in the text data.When a machine processes a large amount of text data,ambiguity is an obstacle to the machine's understanding of natural language.To solve the problem of word sense disambiguation,many researchers try to use different data sources and algorithms.This paper mainly deals with the phenomenon of polysemy in the text.Wikipedia data is used as a knowledge base.According to ambiguous words and context information,entities are found from the knowledge base to interpret ambiguous words.In order to solve the problem of polysemy,this paper mainly proposes a model that combines various methods of training.In the entity linking method of this paper,the experiment mainly uses the reference word,the mention word context and the mention word document information to generate the representation vector of the mentioned word.Entity representation vector generation mainly uses entity name and entity document information.In the entity linking,the entity linking task is completed by measuring the similarity between the reference word representation vector and the entity representation vector.In this paper,we use the Edit Distance Algorithm and the Vector Space Model when measuring the similarity of text.The experiment used clustering model to cluster text data because the dimensions of the vector space model are too large,the data is sparse.The experiment mainly uses K-means algorithm and Agglomerative Hierarchical Clustering Algorithm.Using the clustering model reduces the dimensions of the reference word representation vector and the entity representation vector and the clustering model improves the experimental model to some extent.Clustering text not only solves the problem of data sparseness,but also obtains text-based category representation to some extent.Finally,this paper uses the method of neural network,and combines the previous methods together,using different granularity of reference and related information.In this paper,we use doc2 vec and word2 vec models to vectorize documents and words,respectively.From the test results of the model,the entity link model canbe improved to a certain extent by using various methods and using different granularity reference word information.
Keywords/Search Tags:entity link, word sense disambiguation, Wikipedia, K-means clustering, vector space model, doc2vec
PDF Full Text Request
Related items