Research On Name Disambiguation In The Field Of Journalism

Posted on:2011-05-11

Degree:Master

Type:Thesis

Country:China

Candidate:C Li

Full Text:PDF

GTID:2248330395958348

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Name disambiguation is one of the most important problems in the fields of information retrieval, data mining and so on. A lot of remarkable achievements have been made and a large number of algorithms have been proposed about name disambiguation. However, due to the differences of applications areas, data, and the complexity of the task itself, there are still many problems in the name disambiguation task. Firstly, we use traditional clustering algorithm to do the name disambiguation. In order to use the background knowledge, we propose a name disambiguation method based on the framework of peopleâ€™s attributes and relationships and propose a two-stage method for name disambiguation base on exclusive and non-exclusive.We use the traditional hierarchical clustering algorithm to do the name disambiguation task, transform the name disambiguation task into the task of documents clustering and improve the feature selection method. On the different weight calculation methods and different distance calculation methods between clusters, we do the comparative experiments. However, the name disambiguation method based on the traditional clustering just uses the word as the feature and the feature does not make any distinctions. We propose a similarity calculation method based on named entities and entity words and combine the features with different weights. In the task of this paper, we get better performance to use single-link method and the similarity calculation method based on named entities and entity words.However, because of the diversity of documents theme and peopleâ€™s own characteristics, person categories can not be represented by the documents theme or the documents theme is not clear. Therefore, this paper proposes a name disambiguation method based on the framework of peopleâ€™s attributes and relationships. Firstly, we identify the attributes and relevant entities and then use the background knowledge to judge whether the persons are the same people. Finally, this paper proposes a two-stage method for name disambiguation base on exclusive and non-exclusive combining the method based on the traditional hierarchical clustering and the method based on the framework of peopleâ€™s attributes and relationships. We do the comparative experiment. Compared to the name disambiguation method based on traditional clustering algorithm, the method using the two-stage method for name disambiguation base on exclusive and non-exclusive get better performance. The average F1score increases3.1percentage points with the purity evaluation method and the average F1score increases4.2percentage points with the BCubed evaluation method. The results show the effectiveness of our methods.

Keywords/Search Tags:

natural language processing, name disambiguation, text clustering, attributes andrelations framework

PDF Full Text Request

Related items

1	Explore The Construction Of A Natural Language Programming Framework In Auditing
2	Research On Chinese Polysemy Disambiguation Method Based On VCK-vector Model
3	Design And Implementation Of Probabilistic Disambiguation Model Based On BCG
4	Research On Word-level Ambiguity Resolution Method
5	Design And Implements Of WSD System Based On Chinese Real Text
6	Word Sense Disambiguation Corpus Automatic Acquisition
7	Prepositional Phrase Attachment Disambiguation Of Natural Language Processing
8	Research On Text Classification Based On Natural Language Processing And Machine Learning
9	Research On Construction Of Entity-Attributes-Framework Semantic Knowledge Base
10	Research On Text Representation Model And Application In Text Classification And Natural Language Inference