Font Size: a A A

Research On Name Disambiguation In The Field Of Journalism

Posted on:2011-05-11Degree:MasterType:Thesis
Country:ChinaCandidate:C LiFull Text:PDF
GTID:2248330395958348Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Name disambiguation is one of the most important problems in the fields of information retrieval, data mining and so on. A lot of remarkable achievements have been made and a large number of algorithms have been proposed about name disambiguation. However, due to the differences of applications areas, data, and the complexity of the task itself, there are still many problems in the name disambiguation task. Firstly, we use traditional clustering algorithm to do the name disambiguation. In order to use the background knowledge, we propose a name disambiguation method based on the framework of people’s attributes and relationships and propose a two-stage method for name disambiguation base on exclusive and non-exclusive.We use the traditional hierarchical clustering algorithm to do the name disambiguation task, transform the name disambiguation task into the task of documents clustering and improve the feature selection method. On the different weight calculation methods and different distance calculation methods between clusters, we do the comparative experiments. However, the name disambiguation method based on the traditional clustering just uses the word as the feature and the feature does not make any distinctions. We propose a similarity calculation method based on named entities and entity words and combine the features with different weights. In the task of this paper, we get better performance to use single-link method and the similarity calculation method based on named entities and entity words.However, because of the diversity of documents theme and people’s own characteristics, person categories can not be represented by the documents theme or the documents theme is not clear. Therefore, this paper proposes a name disambiguation method based on the framework of people’s attributes and relationships. Firstly, we identify the attributes and relevant entities and then use the background knowledge to judge whether the persons are the same people. Finally, this paper proposes a two-stage method for name disambiguation base on exclusive and non-exclusive combining the method based on the traditional hierarchical clustering and the method based on the framework of people’s attributes and relationships. We do the comparative experiment. Compared to the name disambiguation method based on traditional clustering algorithm, the method using the two-stage method for name disambiguation base on exclusive and non-exclusive get better performance. The average F1score increases3.1percentage points with the purity evaluation method and the average F1score increases4.2percentage points with the BCubed evaluation method. The results show the effectiveness of our methods.
Keywords/Search Tags:natural language processing, name disambiguation, text clustering, attributes andrelations framework
PDF Full Text Request
Related items