Font Size: a A A

Research And Implemention Of Name Entity Disambiguation

Posted on:2018-05-22Degree:MasterType:Thesis
Country:ChinaCandidate:X YangFull Text:PDF
GTID:2348330518996708Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the continuous development of network technology, a variety of new means of communication emerge in an endless stream, followed by a network for a large number of unstructured text, how to make better use of these valuable data, which put forward higher requirements on Natural Language Processing technology. As the foundation of Machine Translation, information retrieval and other Natural Language Processing applications, named entity disambiguation has attracted more and more attention.Named entity disambiguation technology mainly includes single disambiguation and collaborative disambiguation. Single disambiguation only focus on a single named entity, ignoring the semantic relationships with other entities in the text, collaborative disambiguation considers that named entities in text are semantically related, especially graph model based on the entity disambiguation because of its effectiveness,and the character that it does not require large-scale training and labeling text, the method has become a hot spot recently. But the existing of collaborative disambiguation with graph model did not make use of semantic correlation, and the unknown entity processing method is too simple, there is still much room for improvement.In summary, the named entity disambiguation technology is still inadequate in some aspects, and we made some improvements, the main contributions are described as follows1. Based on the LDA model and the idea of local community discovery, the thesis proposed a named entity disambiguation method.The method uses the LDA topic model to calculate the semantic relatedness among the candidate entities, and to explore the deep semantic relations among the candidate entities. Except that, it using the local community discovery algorithm based on the personalized PageRank algorithm, the efficient search for the optimal sub graph is used as a result of disambiguation.2. Based on the multi information source, the thesis proposed a new type of named entity classification prediction method, which is used as a supplement to the result of disambiguation. Existing methods in the extraction of candidate categories are based on single information source,this is simple and easy, but the candidate categories are not rich, the results are not good enough, so this paper uses multiple information sources to improve.3. The external knowledge base is constructed based on the off-line data of Wikipedia, which is used as the external information source for named entity disambiguation and entity classification prediction. This topic from the English Wikipedia offline data extraction and the construction of the six key information base consists of an external knowledge base, mainly including: the title disambiguation information knowledge base, knowledge base, knowledge base link redirection information, knowledge base, knowledge base named entity categories.4. Combining these two methods to implement a named entity disambiguation system, the validity of the system is proved through the test on the open data set.
Keywords/Search Tags:named entity disambiguation, topic model, local community discovery, Personalized PageRank, entity typing
PDF Full Text Request
Related items