Font Size: a A A

A Study On Methods Of Author Name Disambiguation In Academic Literature

Posted on:2021-05-05Degree:MasterType:Thesis
Country:ChinaCandidate:S W TuFull Text:PDF
GTID:2428330620968081Subject:Business analysis
Abstract/Summary:PDF Full Text Request
With the booming of global academic research activities,the amount of academic literature is increasing day by day.In order to cope with the challenges of massive document management,a series of academic literature databases,academic search platforms and academic knowledge graphs are changing the way we organize,manage,query and acquire academic documents.However,due to the author name ambiguity widely existing in academic documents,lots of academic literature share the same author name can't be distinguished directly.For example,in the academic information retrieval scenario,the retrieval method based on author's name can't accurately obtain the corresponding author's academic literature.In addition,the ambiguity of the author's name of academic literature also brings a lot of obstacles to the research of information science and bibliometrics.Such as affecting the accuracy of academic evaluation.Therefore,it is vital to disambiguate the authors share the same name in the academic literature.In this context,this paper focuses on the problem of author name disambiguation in academic literature.As a whole,the main contents and innovations of this thesis can be summarized into two parts:(1)For the cold start scenario of author name disambiguation in academic literature,a author name disambiguation method is proposed,which combines the heterogeneous graphs network features with the semantic features of academic literature.According to the heterogeneous graph network construct from academic literature,authors and organizations,the method learns the relationship representation vector of academic literature by the way of random walk based on metapath.Then,word2 vec is used to extract the semantic features from the academic literature and generate the semantic representation vector of the academic literature.Then the similarity matrix is obtained by similarity calculation.Finally,the DBSCAN clustering method is used to achieve the same name disambiguation.(2)As for the incremental disambiguation scenario of author name disambiguation in academic literature,it is treated as a similarity matching problem.A multi feature fusion similarity calculation method is proposed to disambiguate the new academic literature.This method extracts the basic text similarity features according to the academic literature data.In order to obtain the semantic similarity features,we use the pretraining language model BERT to extract the semantic features,and then calculate the similarity between the new paper's feature vector and the feature vector of each author to be matched to get the similarity features between them.This paper combines the text similarity feature and semantic similarity feature,and uses xgboost model to match the similarity,assigns the new literature to the author with the highest similarity,and completes the incremental disambiguation of academic literature.
Keywords/Search Tags:Author Name Disambiguation, Clustering, Heterogeneous Network, Network Embedding, Similarity Match
PDF Full Text Request
Related items