Font Size: a A A

Author Name Disambiguation Based Rule And Graph Model

Posted on:2021-05-03Degree:MasterType:Thesis
Country:ChinaCandidate:L Z ZhangFull Text:PDF
GTID:2428330620976435Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Author name disambiguation has long been viewed as a challenging problem in scientific literature management,and with the substantial growth of the scientific literature,the solution to this problem has become increasingly difficult and urgency.Despite author name disambiguation has been extensively studied in academia and industry,this problem remains largely unresolved due to the clutter of data and the complexity of the scenario with the same name.This paper conducts research on the author name disambiguation problem in large-scale academic papers.The main research works are as follows:(1)A method of constructing the paper's relationship graph based on atomic cluster is proposed.The strongly related papers are gathered to from an atomic cluster in advance.In the graph,Papers and atomic clusters are nodes and edges are constructed based on relationship between papers and atomic clusters,papers and papers.This method reduces the scale of the graph.(2)Combining the paper content information and the relationship between the papers for disambiguation.Our model first transforms papers into a unified embedding space by utilizing the feature attribute information of paper itself,then for a name reference,we construct a paper relationship graph.And we use a graph auto-encoder to combine the relationship information and feature attribute information to learn to get the paper final embeddings.Finally,a hierarchical agglomerative clustering algorithm is performed on the names to be disambiguated.Experiments demonstrate that our model provides significant performance improvement over other methods.(3)A rule-based disambiguation post-processing algorithm is proposed.The algorithm utilizes two strong disambiguation features,such as co-authorship and author's affiliation,to perform rule constraints.And then processes each candidate set of names to be disambiguated on two levels.Experiments show that the algorithm can significantly improve the disambiguation performance of model when using the predicted cluster number(i.e.the predicted number of authors with the same name).This paper conducts two experiments on public real large-scale author name disambiguation dataset: 1)we compare the model in this paper with the existing methods when specifying the number of clusters(i.e.the actual number of authors per name),Experiment results show that our disambiguation model has a 3%-10% improvement in terms of F1 value compared to other methods;2)When the number of clusters is not specified,each disambiguation model is combined with the disambiguation post-processing algorithm proposed in this paper for experiments.Experimental results show that the post-processing algorithm proposed in this paper can significantly improve the performance of disambiguation.
Keywords/Search Tags:name disambiguation, word embedding, graph auto-encoder, clustering
PDF Full Text Request
Related items