Font Size: a A A

Research On Author Name Disambiguation Method Based On Network Representation Learning

Posted on:2022-09-19Degree:MasterType:Thesis
Country:ChinaCandidate:Q YangFull Text:PDF
GTID:2518306542963359Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the advent of the information age,global academic research activities have developed rapidly,and the number of academic literatures published has increased dramatically.In order to solve the problem that the massive academic literatures are difficult to manage,large-scale literature databases and academic retrieval platforms with Internet technology as the core have emerged,which has greatly changed the way scientific researchers search for literature data.However,authors from different literature retrieval systems share the same name,which makes it impossible for a large number of academic literatures of authors with the same name to be correctly attributable and reduces the accuracy of academic retrieval.At present,a large number of scholars have proposed author name disambiguation algorithms based on machine learning or graph-based method,but there are problems such as insufficient feature information and ignoring global structure information.Therefore,from the perspective of making full use of feature information and considering global structure information,this thesis conducts an indepth study on the author name disambiguation in scientific literature databases.The main works of this thesis are as follows:(1)A name disambiguation method based on the feature of collaborators is proposed.First of all,considering the privacy of the information of authors,this thesis utilizes high-level collaboration relationship to build homogeneous academic networks.Secondly,in order to obtain the global structure information in the network,a network representation learning method based on co-occurrence of global nodes in the academic network is proposed.This method learns the low-dimensional vector representation of each literature.Then,a densitybased clustering algorithm is used to divide the literatures into several clusters,so that the literatures in each cluster are written by the same author.Finally,the experimental results on two real literature datasets show that the method proposed in this thesis is better than the comparison methods in disambiguation performance.(2)A name disambiguation method based on heterogeneous academic networks and semantic features is proposed.First,in view of the heterogeneity of academic literature data,it is proposed to construct a heterogeneous academic network using the entity features of the literature.Secondly,considering the rich semantic features in academic literature,this thesis employs semantic features for the initialization of literature node vectors.Then,a heterogeneous network representation learning method based on pairwise ranking is used to obtain the feature representations of the node.Next,a hierarchical cluster analysis is performed on the names of the authors to be disambiguated.Finally,the experimental results on a real AMiner dataset show that the method proposed in this thesis can obtain a better disambiguation effect.
Keywords/Search Tags:Academic search platform, Author name disambiguation, Network representation learning, Clustering
PDF Full Text Request
Related items