Research On Name Disambiguation Algorithm Based On Multi-view Non-negative Matrix Factorization

Posted on:2016-04-13

Degree:Master

Type:Thesis

Country:China

Candidate:Z Huang

Full Text:PDF

GTID:2308330461478530

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of all the research fields and Internet technology, the number of scientific literature has increased explosively in academic resource platforms. The growth of literature data makes name ambiguity more prominent in the platforms. Name ambiguity in academic resource platforms is a phenomenon that since distinct author entities share the same or similar name, the literatures published by the distinct author entities are mixed together. Name ambiguity has severely reduced the accuracy of publication retrieval and the subsequent algorithms. Therefore, name disambiguation in academic resource platforms is an important research topic.For the problem of name ambiguity in academic resource platforms, this paper proposes a name disambiguation algorithm based on multi-view non-negative matrix factorization (multi-view NMF). Around this issue, the main research works in this paper consist of the following aspects:Based on the analysis of the advantages of multi-view clustering, this paper proposes a novel clustering algorithm based on multi-view NMF (MvNMF) and does a comprehensive research from the aspects of the model, the iterative algorithm, the time and space complexity as well as the convergence. The experiments show that the proposed algorithm outperforms other multi-view clustering algorithms in the clustering effect.For the data of name disambiguation is sparse, this paper develops the iterative algorithm of MvNMF. Based on the research of coordinate descent method, this paper proposes two iterative algorithms with variable selection:the greedy-based algorithm and the roulette-based algorithm, in addition, analyses the time and space complexity and the convergence of the algorithms. The experiments demonstrate that, in the case of sparse data, the two algorithms with variable selection outperform coordinate descent method in both the convergence speed and the clustering effect. Although the roulette-based algorithm is slightly worse than the greedy-based algorithm, it outperforms the latter one in the clustering effect.On the basis of the above algorithms, after analyzing the attributes of literatures’metadata, this paper selects title, author names, venue as the evidences for disambiguation and proposes a name disambiguation algorithm based on MvNMF. The algorithm first does a preliminary clustering by using the evidence of co-authorship to obtain accurate fragments and provide the basic data for title and venue. Then, it applies MvNMF in clustering the fragments obtained by the preliminary clustering and finally achieves the purpose of name disambiguation. The experiments confirm that the proposed algorithm is balanced and outperforms existing state-of-the-art name disambiguation algorithms by 1.9% to 24.6% on the average PairwiseF1 score.

Keywords/Search Tags:

Name Disambiguation, Multi-view Non-negative Matrix Factorization, Variable Selection, Clustering

PDF Full Text Request

Related items

1	The Research Of Multi-view Latent Association Mining Based On Image Set And Non-negative Matrix Factorization
2	The Research Of Incomplete Multi-View Clustering Algorithm Based On Matrix Factorization
3	Research On Multi-view Clustering Methods Based On Non-negative Matrix Factorization
4	Research On Multi-view Clustering
5	Multi-View Clustering Based On Integrated Weight Learning And Non-Negative Matrix Factorization
6	Multi-view Clustering Based On Non-negative Matrix Factorization
7	The Research And Application Of Name Disambiguation Algorithm Based On Multi-level Clustering
8	Research On Semi-Supervised Multi-View Clustering And Two-View Multi-Instance Clustering
9	Multi-view Clustering Based On Deep Graph Regularized Non-negative Matrix Factorization
10	Researches On Diversity Multi-view Clustering