Font Size: a A A

Research On Author Name Disambiguation Based On Gsdpmm Algorithm

Posted on:2021-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:K X WangFull Text:PDF
GTID:2428330602480863Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of computer technology and communication technology,human society has entered the information age.Great changes have taken place in the form,scale,access,storage and transmission of information and knowledge.With the development of Internet,information explosion is more and more obvious.And some problems are more and more obvious.The ambiguity of the author's name in electronic publications is one of the most prominent problems.In many fields,name disambiguation is regarded as a significant but challenging problem,such as in literature management,social network analysis and other scenarios.The author name disambiguation for papers refers to the use of the titles,author names,author organizations,abstracts,keywords and other information of papers to allocate papers to the correct author entity clusters through some methods.As so far,many solutions to author name disambiguation have been proposed by researchers.These methods can be divided into supervised,unsupervised,semi-supervised,graph based and heuristic based algorithms according to their relationship with machine learning.This paper proposes an efficient unsupervised author name disambiguation method with some heuristic features.This method mainly uses text clustering algorithm based on GSDPMM to solve this problem.Based on the detailed analysis of GSDPMM algorithm,we modify GSDPMM algorithm to make it more suitable for the current application.Through experiments,the results show that the algorithm still achieves good results in the data set which is not very neat.At the same time,the time complexity of the algorithm increases linearly with the number of documents.That is to say,the algorithm can be well used in the scene with large amount of data.In this paper,the setting of several parameters in the algorithm is discussed.
Keywords/Search Tags:GSDPMM, Unsupervised learning, Name disambiguation, Text clustering
PDF Full Text Request
Related items