Named Entity Disambiguation Based On Chinese And English Wikipedia Knowledge Base

Posted on:2016-08-19

Degree:Master

Type:Thesis

Country:China

Candidate:N C Zuo

Full Text:PDF

GTID:2298330467492032

Subject:Electronics and Communications Engineering

Abstract/Summary:

PDF Full Text Request

Word disambiguation is an important task in natural language processing. Recently, Named Entity Linking (NEL) has been widely used to solve the task of word disambiguation. NEL grounds entity mentions to their corresponding node in a Knowledge Base (KB).We studied into several popular strategies of named entity disambiguation and find out the differences between them. This paper put forward a method to solve the named entity disambiguation task based on the Wikipedia data. Meanwhile, this method can be applied to both Chinese and English based on the theory and experiments. In this paper, the main research contents and results are as follows:1. In this paper, we build up a Chinese knowledge base and source document collection. This paper refer to TAC KBP’s English knowledge base’s structure and build a Chinese knowledge base which contain over3740000entries based on Chinese Wikipedia. As for the source document collection, it contains17ambiguous mentions which may refer to61entries in the KB, the collection contains328documents.2. This paper put forward a theory to analyze and extract knowledge from the Wikipedia and build8separate datasets. These datasets are applied to entity disambiguation, including the dataset of entry’s specification name, the dataset of redirected information, the dataset of disambiguation information, the dataset of linked entities, the dataset of popularity information, the dataset of entity’s context and other entities near it, category dataset. These8separate datasets can help with the task of named entity disambiguation as well as machine translation, information retrieval, web search and intelligent system.3. This paper put forward a method to analyze candidate nodes. This paper compute a five-feature-vector for each candidate node with the algorithm of PageRank, VSM and other statistical learning theory. A five-feature-vector may reflect the similarity of the node and mention together with the node’s own popularity. String similarity, popularity, context similarity, relatedness of link entities and queries, relatedness of category are5features we extracted.4. The paper put forward a method which based on the five-feature vector and apply Decision Tree to classify. We also build up a system for Entity Linking to verify the method. This paper accomplished Entity Linking task on the data of TAC2012, TAC2013and Chinese data. The method proved to be effective with a12-percent-raise of the F1and it can be widely used.

Keywords/Search Tags:

Named Entity, Disambiguation, Knowledge BaseWikipedia

PDF Full Text Request

Related items

1	Research On Named Entity Recognition And Disambiguation Based On Network Semantic Resource
2	Research And Implementation Of Named Entity Disambiguation Based On Wikipedia
3	Research On Graph Based Named Entity Disambiguation
4	Research On Multi-Source Named Entity Disambiguation Method For Researchers
5	Research And Implemention Of Name Entity Disambiguation
6	Chinese Named Entity Recognition And Disambiguation Research
7	Research On Named Entity Disambiguation In Deep Web
8	Research And Application Of The Chinese Organization Names Recognition And Disambiguation
9	Knowledge Mining Based On Statistical Snowball Models
10	Research On Named Entity Recognition And Disambiguation For Short Text