Research And Implementation Of Named Entity Disambiguation Based On Wikipedia

Posted on:2015-12-02

Degree:Master

Type:Thesis

Country:China

Candidate:X Yang

Full Text:PDF

GTID:2298330467463520

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the development of information technology, large unstructured data was generated on the network. How to get useful information from these large data, become the problem needed to solve in the NLP field. Due to the ambiguity problem, systems are unable to accurately obtain the meaning of the texts, which restricts the development of related technologies. Therefore, study of eliminating ambiguity has a profound meaning.Named Entity Disambiguation involves many key technologies, including feature extraction, ranking, clustering, etc. This paper proposes a Named Entity Disambiguation method based on Wikipedia through the deeply study of these problems, and builds a prototype system. The experiment performs well on the two dataset Entity Linking2012and Entity Linking2011, achieving an F-measure of0.670and0.746respectively. The main contributions of this paper are as follows:1. This paper proposes an acronym expansion method based on many rules. The most past methods only used a small number of rules. As a result of the flexibility and diversity of natural language, these methods are not suitable for all abbreviations and acronyms. Therefore, this paper uses a variety of rules to expand acronyms.2. This paper extracts a variety of features to describe mentions and the relationships between mentions and candidates. Most studies tend to focus on extracting surface features. They have some limitations due to the widespread phenomenon of polysemy. Therefore, this paper extracts not only surface features but also semantic features.3. Using Learning to Rank to rank the candidates. Traditional ranking methods are simple and easy to debug, but these methods only uses few features which cannot make the systems reach high performance. Therefore, this paper uses Learning to Rank methods in the candidates ranking step.4. This paper proposes a Named Entity Disambiguation method combining Entity Linking and Clustering. This method is effective to make up for past methods which only used Entity Linking or Clustering.

Keywords/Search Tags:

named entity disambiguation, entity linking, learning torank, clustering, feature extraction

PDF Full Text Request

Related items

1	Research On Named Entity Recognition And Disambiguation Based On Network Semantic Resource
2	Design And Implementation Of Entity Linking System For Chinese Novels
3	Named Entity Linking Based On Multisource Knowledge
4	Research On Document Oriented Entity Linking Method
5	Research For Algorithm Of Chinese Entity Linking Technology Based On Topic Relation Graph
6	The Multi-strategic Research Of Chinese Weibo Entity And Wikipedia Entry Linking
7	Research On Entity Linking Algorithm Based On End-to-end Joint Disambiguation
8	Research On Several Key Issues On TAC-KBP Evaluation
9	Entity Linking Algorithm Research And System Implementation Based On Wikipedia
10	Learning for information extraction: From named entity recognition and disambiguation to relation extraction