Research On Disambiguation Of Same Authors In Academic Collaboration Network | | Posted on:2021-04-05 | Degree:Master | Type:Thesis | | Country:China | Candidate:Y X Zhang | Full Text:PDF | | GTID:2518306461970529 | Subject:Computer technology | | Abstract/Summary: | PDF Full Text Request | | Recently,with the rapid development of science and technology,more and more academic papers have been published on the Internet.The amount of paper data contained in academic network system is also increasing.In the real world,there are many different people with the same name.This phenomenon leads to the academic achievements of many people with the same name and surname are wrongly aggregated together.This kind of academic paper database will seriously affect the performance of paper retrieval,and will also cause the wrong attribution of credit and responsibility in the future digital forensics.Therefore,the disambiguation task of the same name authors in academic collaboration network becomes particularly important.In order to solve the ambiguity resolution problem of different authors with the same name in large academic database,this paper proposes an algorithm based on Metapath heterogeneous network embedding.The first task of the algorithm implementation is to complete the data processing in the academic collaboration network.The data is processed by retaining the feature attributes that are highly involved in the name disambiguation results.The feature attributes are stored in an efficient format in a local system file.The generation of the system file is immediately followed by the use of a GRU-based encoder to learn a preliminary representation of the academic paper title that contains author and publication information.After that,a heterogeneous academic paper network is constructed based on the connections between academic paper nodes.The algorithm used in this academic paper network draws lessons from the ideas of deepwalk and metapath2 vec.The core idea of the algorithm is to generate a library of training paths based on random wandering of metapaths in the heterogeneous network.Finally,the representation vectors of the nodes in heterogeneous networks are learned according to the training path.The low dimensional representation of the relationship between academic papers and the semantic relationship of the text is obtained.Using cosine similarity calculation method to cluster academic papers,the author category of academic papers is obtained.Complete the ambiguity resolution of the author of the same name.On this basis,an academic cooperation network system is established to eliminate the ambiguity between synonymous authors.The experimental results show that the proposed disambiguation algorithm based on meta-path heterogeneous network embedding has a high accuracy rate. | | Keywords/Search Tags: | Natural language processing, Disambiguation of the same author, DBLP dataset, Heterogeneous Neural Network, GRU, Meta Path, Random Walk | PDF Full Text Request | Related items |
| |
|