Font Size: a A A

Research On Scholar Disambiguation Based On Heterogeneous Information Networks And Fine-Grained Features

Posted on:2020-02-03Degree:MasterType:Thesis
Country:ChinaCandidate:C R LiFull Text:PDF
GTID:2428330590961099Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The sharing of academic resources has made researchers increasingly dependent on public academic resources.The popularity of duplicate names in resource information and the inconsistency of the recording methods caused by cultural differences have led to the existence of a large number of same names of scholars in the database.The issue of name ambiguity has become a major obstacle to the retrieval of academic resources.There are several problems in the existing scholar's name disambiguation solution:(i)The insufficient use of effective information.In the presentation of learning,due to insufficient consideration of features like journals,and the types of relationships between authors and journals,the representation model is single,which is insufficient to fully describe the entities to be disambiguated;(ii)The isomorphic algorithm cannot effectively represent heterogeneous features.There are differences among relations such as references and works,as well as among attributes like published journals and abstracts.Existing isomorphic algorithms cannot accurately extract the heterogeneous features of the literature;(iii)Poor fault tolerance.Many models fail to consider the absence of features,which is difficult to be directly applied in realworld scenarios.In response to these problems,this thesis proposes several scholar name disambiguation methods based on heterogeneous information networks and fine-grained features,namely:(i)Proposed a multi feature and relation fusion-based author disambiguation method(MFRAD).On the basis of the works and cooperative relations commonly used by scholar disambiguation algorithms,information such as citations,affiliations,abstract and so on are considered in this method.It constructs multiple heterogeneous information networks and combines various structural information and text information to extract the features of the literature comprehensively.Moreover,we design a scalable loss function based on pairwise constraints to represent the network information,so that the model can adapt to different datasets.(ii)Proposed a heterogeneous relation-aware network embedding model(HRANE)that addresses the limitations of a single model.This study analyzes the influence of document features on name disambiguation and the difference of relationship types.We construct heterogeneous relationship networks with different intensities to constrain the study of document features.In order to reduce the effect of incomplete network generated by the absence of strong features on scholar disambiguation.(iii)Proposed a heterogeneous relation-aware and feature enhanced network embedding model(HRFENE)to make more efficient use of weak features.HRFENE retains strong feature networks including cooperation,citations and works,and strong features,such as periodicals.It uses weak features and strong features as node attributes in strong feature networks.Moreover,it iteratively learns of network structure information and node attribute information is conducted to better represent the Disambiguation entity.The complexity of the model is analyzed.(iv)Verified the validity of the network representation model in this thesis on the public dataset.The results show that the Macro-F1 of HRFENE is improved by 19.27% and 10.96% in Aminer and DBLP dataset respectively,comparing with the one of highest Macro-F1 in the comparative models.For the disambiguation of single names,the result increased by a maximum of 38.71%.Based on the above model,this thesis also constructs a set of semiautomated scholar name disambiguation framework.By optimizing the feature clustering algorithm and artificial feedback link,the scholar name disambiguation can be performed efficiently and accurately.
Keywords/Search Tags:Name Disambiguation, Entity Recognition, Network Embedding, Clustering
PDF Full Text Request
Related items