Research On Name Disambigutaion And Its Application

Posted on:2010-04-14

Degree:Master

Type:Thesis

Country:China

Candidate:F Wang

Full Text:PDF

GTID:2178360278462165

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Name disambiguation aims at resolving the problem of multiple persons having the same name, also called name ambiguity. Name ambiguity is a general problem in the real world, and now become one of the challenges for information integration, information retrieval, and data mining applications, especially with the fast development of the Web. In this thesis, we try to conduct a thorough investigation of this problem. Specifically, we study the problem using academic data. We formally define the problem of name disambiguiation in the academic social network and propose two approaches to solve it: an atomic cluster-based disambiguation approach and a constraint-based topic modeling approach.As data points in the academic social network are usually sparse, traditional clustering algorithms usually fail to achieve good performance. We propose an atomic cluster-based disambiguation approach, which consists of two stages. In the first stage, we propose using an extended AdaBoost algorithm to automatically detect atomic clusters, inside which points are strongly connected and in the second stage, based on the detected atomic clusters, we use different clustering methods to find the final result of name disambiguation. Experimental results show that this atomic cluster method can significantly outperform the traditional clustering methods: averagely +25% than k-means and +8% than hierarchical clustering algorithm.We further study a topic modeling approach for name disambiguation. The basic idea is to map data points from original feature space to a topic space. However, traditional topic model cannot find good topic distribution in the researchers'social network data. We thus proposed a constraint based topic model to break down this limitation. We define five types of constraints according to the background knowledge and incorporate the constraint into the objective function of the topic model. An adapted Gibbs sampling algorithm is employed to estimate parameters of the model. Finally, based on the discovered topics, we use a hierarchical clustering method to find the final disambiguation results. Experiments show that the constraints based method can significantly improve the performance of name disambiguation.We apply the proposed name disambiguation approach to a real-world academic search system: ArnetMiner. A name disambiguation module has been designed and integrated into the system.

Keywords/Search Tags:

Name Disambiguation, Atomic Cluster, Constraint, Topic Model

PDF Full Text Request

Related items

1	An Approach To Predicting Concurrency Bugs Based On Constraint Solving
2	Research And Implementation Of Web Name Disambiguation
3	Research And Implemention Of Name Entity Disambiguation
4	Text Clustering Method In Topic Detection And Person Name Disambiguation
5	Domain Entity Disambiguation And Link Prediction Based On Representation Learning
6	Research On Word Sense Disambiguation Based On K-means Cluster And LSTM
7	Research On Geometric Constraint Solving Based On Cluster
8	The Research On Academic Paper Author Name Disambiguation
9	Research On Chinese Person Name Disambiguation Algorithm
10	Research Of Text Processing Method And Application Based On Attention Mechanism And Word Sense Disambiguation