Font Size: a A A

Research And Implementation Of Named Entity Disambiguation Based On Wikipedia

Posted on:2015-12-02Degree:MasterType:Thesis
Country:ChinaCandidate:X YangFull Text:PDF
GTID:2298330467463520Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of information technology, large unstructured data was generated on the network. How to get useful information from these large data, become the problem needed to solve in the NLP field. Due to the ambiguity problem, systems are unable to accurately obtain the meaning of the texts, which restricts the development of related technologies. Therefore, study of eliminating ambiguity has a profound meaning.Named Entity Disambiguation involves many key technologies, including feature extraction, ranking, clustering, etc. This paper proposes a Named Entity Disambiguation method based on Wikipedia through the deeply study of these problems, and builds a prototype system. The experiment performs well on the two dataset Entity Linking2012and Entity Linking2011, achieving an F-measure of0.670and0.746respectively. The main contributions of this paper are as follows:1. This paper proposes an acronym expansion method based on many rules. The most past methods only used a small number of rules. As a result of the flexibility and diversity of natural language, these methods are not suitable for all abbreviations and acronyms. Therefore, this paper uses a variety of rules to expand acronyms.2. This paper extracts a variety of features to describe mentions and the relationships between mentions and candidates. Most studies tend to focus on extracting surface features. They have some limitations due to the widespread phenomenon of polysemy. Therefore, this paper extracts not only surface features but also semantic features.3. Using Learning to Rank to rank the candidates. Traditional ranking methods are simple and easy to debug, but these methods only uses few features which cannot make the systems reach high performance. Therefore, this paper uses Learning to Rank methods in the candidates ranking step.4. This paper proposes a Named Entity Disambiguation method combining Entity Linking and Clustering. This method is effective to make up for past methods which only used Entity Linking or Clustering.
Keywords/Search Tags:named entity disambiguation, entity linking, learning torank, clustering, feature extraction
PDF Full Text Request
Related items