Font Size: a A A

Research Of Author Name Disambiguation Based On Multi-Type Features And Heterogeneous Information Network Embedding

Posted on:2024-08-11Degree:MasterType:Thesis
Country:ChinaCandidate:B WangFull Text:PDF
GTID:2568307064496844Subject:Engineering
Abstract/Summary:PDF Full Text Request
In recent years,as academic research continues to expand in scale,a growing amount of academic paper data is being added to online digital libraries.However,an enduring issue within the academic community is the problem of author name ambiguity.Due to inconsistent data quality and the lack of author’s information,papers with the same author name are frequently mixed together and cannot be differentiated.The problem of author name ambiguity significantly increases the search cost for researchers when retrieving literatures,and causes confusion in the attribution of research results,which decreases the overall service quality of digital libraries.The task of author name disambiguation,which seeks to partition a set of papers with the same author name in order to differentiate between real-world researchers whom those papers belong to,has gained significant research attention.Although some feature-based and linkage-based author name disambiguation methods have been proposed,they often suffer from the following three limitations:(1)They tend to rely solely on paper’s text or relational properties,thereby failing to fully exploit the multi-type features that might prove to be useful in the disambiguation process.(2)They often treat the similarity of co-authors as a pivotal disambiguation criterion without dealing with the ambiguity of the attribute itself,which ultimately leads to limited disambiguation results.(3)When performing author name disambiguation based on network embedding techniques,existing methods only construct homogeneous networks for the local paper data corresponding to the ambiguous author name.They fail to utilize the rich semantic information which exists within the global heterogeneous information networks.Our study primarily focuses on addressing the above limitations of the existing works.In this paper,we propose two novel author name disambiguation methods that integrate multi-type features and leverage the semantics of global heterogeneous information networks and efficiently manage co-authorship ambiguities.Our work and innovations are summarized as follows:(1)In this paper,we propose a semantic and relational features combined author name disambiguation method(SRAND).This method leverages the strengths of both feature-based and linkage-based methods,enabling the effective incorporation of multi-type features that are critical for accurate author name disambiguation.The semantic feature embedding module leverages a powerful supervised learning model to effectively learn from a large number of negative samples present in the disambiguation dataset.Additionally,we design a network embedding model that jointly considers three kinds of relational features.In constructing the co-authorship network,we also account for the likelihood of possible co-author ambiguity in the global dataset and incorporate this information into the local network by assigning appropriate weights.This strategy substantially reduces the negative impact of co-author ambiguity on the overall disambiguation performance.(2)Based on SRAND,we further propose a global heterogeneous information network embedding-based author name disambiguation method(GAND)to overcome co-author ambiguity and the lack of global information.After conducting a thorough analysis of the underlying causes of co-author ambiguity,this method expands the local network and incorporates additional global information from reliable co-authors.Furthermore,we propose a heterogeneous information network embedding model that utilizes meta-path based random walks.The proposed model effectively integrates both global and local information for disambiguation.Experimental results on public name disambiguation datasets demonstrate that the proposed two methods achieve remarkable performances compared to existing works.
Keywords/Search Tags:Author Name Disambiguation, Feature Embedding, Heterogeneous Information Network, Network Embedding
PDF Full Text Request
Related items