Font Size: a A A

Research On Name Disambiguation Algorithm Based On Multi-view Non-negative Matrix Factorization

Posted on:2016-04-13Degree:MasterType:Thesis
Country:ChinaCandidate:Z HuangFull Text:PDF
GTID:2308330461478530Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of all the research fields and Internet technology, the number of scientific literature has increased explosively in academic resource platforms. The growth of literature data makes name ambiguity more prominent in the platforms. Name ambiguity in academic resource platforms is a phenomenon that since distinct author entities share the same or similar name, the literatures published by the distinct author entities are mixed together. Name ambiguity has severely reduced the accuracy of publication retrieval and the subsequent algorithms. Therefore, name disambiguation in academic resource platforms is an important research topic.For the problem of name ambiguity in academic resource platforms, this paper proposes a name disambiguation algorithm based on multi-view non-negative matrix factorization (multi-view NMF). Around this issue, the main research works in this paper consist of the following aspects:Based on the analysis of the advantages of multi-view clustering, this paper proposes a novel clustering algorithm based on multi-view NMF (MvNMF) and does a comprehensive research from the aspects of the model, the iterative algorithm, the time and space complexity as well as the convergence. The experiments show that the proposed algorithm outperforms other multi-view clustering algorithms in the clustering effect.For the data of name disambiguation is sparse, this paper develops the iterative algorithm of MvNMF. Based on the research of coordinate descent method, this paper proposes two iterative algorithms with variable selection:the greedy-based algorithm and the roulette-based algorithm, in addition, analyses the time and space complexity and the convergence of the algorithms. The experiments demonstrate that, in the case of sparse data, the two algorithms with variable selection outperform coordinate descent method in both the convergence speed and the clustering effect. Although the roulette-based algorithm is slightly worse than the greedy-based algorithm, it outperforms the latter one in the clustering effect.On the basis of the above algorithms, after analyzing the attributes of literatures’metadata, this paper selects title, author names, venue as the evidences for disambiguation and proposes a name disambiguation algorithm based on MvNMF. The algorithm first does a preliminary clustering by using the evidence of co-authorship to obtain accurate fragments and provide the basic data for title and venue. Then, it applies MvNMF in clustering the fragments obtained by the preliminary clustering and finally achieves the purpose of name disambiguation. The experiments confirm that the proposed algorithm is balanced and outperforms existing state-of-the-art name disambiguation algorithms by 1.9% to 24.6% on the average PairwiseF1 score.
Keywords/Search Tags:Name Disambiguation, Multi-view Non-negative Matrix Factorization, Variable Selection, Clustering
PDF Full Text Request
Related items