Font Size: a A A

Research On Feature Enhancement Methods For Entity Disambiguation

Posted on:2022-06-16Degree:MasterType:Thesis
Country:ChinaCandidate:Z H HeFull Text:PDF
GTID:2518306536463704Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the explosion of text information on the Internet,the demand for extensive information collection and application is increasing day by day.However,the phenomenon of the same name and ambiguity of the entity mentions in the text brings great obstacles to people's understanding of the information.Entity disambiguation is the key to solve these problems.As a key technology in the field of natural language processing,entity disambiguation task is concerned by many researchers,aimed at mapping ambiguous entity mentions in text to large unambiguous knowledge base to eliminate ambiguity.After the investigation and research at home and abroad,in view of the coherence feature extraction being inaccurate in long-text entity disambiguation,insufficient context information and lack of entity similarity information in short-text entity disambiguation,this thesis respectively proposes a collective entity disambiguation model(GNCED)based on deep semantic mention neighbors and heterogeneous entity correlation,and a short text entity disambiguation model based on multiple text similarity(MTS-STED):(1)To solve the problem of inaccurate extraction of consistency features for document level entity disambiguation,a collective entity disambiguation model GNCED is proposed based on the assumption of local topical coherence.In order to construct more accurate entity consistency features,a mention neighbor selection strategy based on deep semantic distance is proposed,which aims to obtain the long-distance dependence between entity mentions and the internal semantic association of entity in the text.At the same time,the idea of "simple to complex" is introduced to enrich the entity-related information contained in the entity correlation graph and promote the mutual strengthening between low ambiguous and high ambiguous mentions in the process of disambiguation.(2)In order to make the model suitable for short text entity disambiguation with insufficient context information and global information,a short text entity disambiguation model MTS-STED is proposed based on multiple text similarity feature extraction.In the candidate entity generation,multiple generation methods are combined effectively to improve the overall quality of candidate entity set.Then,on the basis of the classical text representation model,this thesis proposes a linguistic driven etymological semantic representation method to mine the deep semantic information from the information items in the context other than entity mentions.In order to better distinguish the meaning difference between different entities,the model also introduces the latest entity embedding representation method.Based on the rich semantic representation work,the multi-text similarity features such as character matching,word-level matching,etymological matching,sentence-level matching and entity matching are constructed,and the disambiguation of short text is realized by the fusion of multiple features.After several rounds of experimental contrast with classic disambiguation algorithm,the experimental results show that the GNCED model based on semantic mention neighbor selection and entity correlation graph construction of entity coherence measurement method,and the MTS-STED model based on multi-layer semantic representation of text and multiple text similarity characteristics have great disambiguation advantages.The overall performance of the disambiguation model is improved.
Keywords/Search Tags:Entity disambiguation, Local topical coherence, Entity correlation graph, Short text representation, Text similarity
PDF Full Text Request
Related items