Font Size: a A A

The Research And Application Of Name Disambiguation Algorithm Based On Multi-level Clustering

Posted on:2014-01-04Degree:MasterType:Thesis
Country:ChinaCandidate:W J LiFull Text:PDF
GTID:2248330395999156Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The same name is a phenomenon that one name is corresponding to multiple real individuals. Name disambiguation aims at separating the real individuals mixed together from each other. This paper classifies and defines the name disambiguation problem in the field of literature management, and proposes a multi-level clustering algorithm combing the coauthor network and latent relations among venues to solve the name disambiguation problem on the background of same name problems met by literature resources management system.For the strong evidence of coauthor in name disambiguation problem, this paper proposes a method based on coauthor and formulates it. Through the analysis of results, we find the same name problem of coauthor. In order to further improve the name disambiguation algorithm, we use the concept of "strong author" based on statistics. Experimental results show that the improved method has high precision.The venues which researchers in different field publish their papers show certain regularities. Referring the application of Latent Semantic Analysis in text mining, this paper uses Non-negative Matrix Factorization to mine and present the latent relations among venues in the view of author and venue. Because mining latent relations is completed during factorization, the relation model is simple and efficient in practice. Experiment shows our method can present the relations among venues correctly.On the basis of the above methods, this paper uses different approaches to cluster the mixed papers hierarchically aiming at the characteristics of coauthor, title and venue. At last experiment confirms that our algorithm is better than the supervised DISTINCT and unsupervised Arnetminer and CSLR. At the same time our method is better than the CLSR1-2orders of magnitude on the measure of execution time.Finally, we apply the name disambiguation algorithm to the platform (Linkscholar) which is designed for sharing literature resources through Web Service.
Keywords/Search Tags:Name Disambiguation, Non-negative Matrix Factorization, CoauthorNetwork, Latent Semantic Analysis
PDF Full Text Request
Related items