Font Size: a A A

Research On Graph Neural Network-Based Name Disambiguation Algorithm

Posted on:2024-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:Z C TangFull Text:PDF
GTID:2568307115495264Subject:Electronic information
Abstract/Summary:PDF Full Text Request
In recent years,with the continuous improvement of the research capacity of Chinese universities and enterprises,the number of academic papers published by Chinese scholars in foreign language journals has been increasing,and the names of Chinese scholars in these documents are mainly expressed in the form of pinyin,which has resulted in an increasing phenomenon of multiple authors sharing the same name.In academic literature retrieval,the existence of author name ambiguity severely affects the efficiency and accuracy of literature retrieval.A more accurate disambiguation algorithm is of great significance for analyzing the composition of scientific researchers in various fields,studying interdisciplinary researchers,and optimizing search engines,so the task of eliminating the eponymous author becomes crucial.In order to solve the problem of Chinese authors with the same name in English literature,this paper proposes a name disambiguation algorithm based on graph neural networks and its systematic implementation.Also,this paper proposes a hybrid institution name disambiguation algorithm for disambiguating the author’s institution name,thus further improving the accuracy of name disambiguation results.The main work of this paper is summarized as follows:(1)To address the problem of eponymous author disambiguation,this paper proposes a name disambiguation method based on graph neural networks.The algorithm first employs a method based on pinyin initial to solve the problem of the same author with multiple name in order to obtain the training sample data.Then,each document to be disambiguated is used as a node of the network,the embedding vector of the document is obtained using the Word2 Vec word vector model,and the strong correlation between the document attribute features is used to build disambiguation feature pairs to construct the academic relationship network.Finally,a graph autoencoder is used to obtain a representation vector containing information on the features of the documents themselves and information on the relationships between the documents.In addition,the algorithm enhances the embedding representation of document nodes by weighting the importance of different meta-paths and different neighbouring nodes on the same meta-path according to their importance to the central node,and then disambiguates homonymous authors through a hierarchical clustering algorithm.The results of the experiment demonstrate that the proposed algorithm has better disambiguation effect with an average F1 value of 70.06%,as compared to several mainstream name disambiguation methods on the Aminer dataset.(2)To address the problem of confusing in author institution names,this paper proposes a hybrid institution name disambiguation algorithm.Firstly,the maximum matching algorithm is used to preprocess the institution names,and then the English names are translated into Chinese by using translation software.Then,the institution names are normalized with the powerful mapping capability and editing distance of the search engine.Finally,the institution names are split according to different matching patterns,and a Co SNET pre-trained model combined with rules is used for disambiguation.Experimental results on a Chinese institution dataset show that the proposed algorithm achieves an average F1 of 95.3%,and can effectively improve the accuracy of name disambiguation results.The experimental results demonstrate the effectiveness of the proposed algorithm.(3)Based on the above algorithm model,an name disambiguation system based on academic big data is built.The system uses the algorithm proposed in this paper to disambiguate the literature data,and the processed results are stored in a database for query purposes.In practice,users can enter the name of the scholar to view disambiguation results and personal details of the scholar.
Keywords/Search Tags:name disambiguation, graph embedding, disambiguation feature pairs, heterogeneous academic relationship network, meta-path neural networks
PDF Full Text Request
Related items