Author Name Disambiguation Based Rule And Graph Model

Posted on:2021-05-03

Degree:Master

Type:Thesis

Country:China

Candidate:L Z Zhang

Full Text:PDF

GTID:2428330620976435

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Author name disambiguation has long been viewed as a challenging problem in scientific literature management,and with the substantial growth of the scientific literature,the solution to this problem has become increasingly difficult and urgency.Despite author name disambiguation has been extensively studied in academia and industry,this problem remains largely unresolved due to the clutter of data and the complexity of the scenario with the same name.This paper conducts research on the author name disambiguation problem in large-scale academic papers.The main research works are as follows:(1)A method of constructing the paper's relationship graph based on atomic cluster is proposed.The strongly related papers are gathered to from an atomic cluster in advance.In the graph,Papers and atomic clusters are nodes and edges are constructed based on relationship between papers and atomic clusters,papers and papers.This method reduces the scale of the graph.(2)Combining the paper content information and the relationship between the papers for disambiguation.Our model first transforms papers into a unified embedding space by utilizing the feature attribute information of paper itself,then for a name reference,we construct a paper relationship graph.And we use a graph auto-encoder to combine the relationship information and feature attribute information to learn to get the paper final embeddings.Finally,a hierarchical agglomerative clustering algorithm is performed on the names to be disambiguated.Experiments demonstrate that our model provides significant performance improvement over other methods.(3)A rule-based disambiguation post-processing algorithm is proposed.The algorithm utilizes two strong disambiguation features,such as co-authorship and author's affiliation,to perform rule constraints.And then processes each candidate set of names to be disambiguated on two levels.Experiments show that the algorithm can significantly improve the disambiguation performance of model when using the predicted cluster number(i.e.the predicted number of authors with the same name).This paper conducts two experiments on public real large-scale author name disambiguation dataset: 1)we compare the model in this paper with the existing methods when specifying the number of clusters(i.e.the actual number of authors per name),Experiment results show that our disambiguation model has a 3%-10% improvement in terms of F1 value compared to other methods;2)When the number of clusters is not specified,each disambiguation model is combined with the disambiguation post-processing algorithm proposed in this paper for experiments.Experimental results show that the post-processing algorithm proposed in this paper can significantly improve the performance of disambiguation.

Keywords/Search Tags:

name disambiguation, word embedding, graph auto-encoder, clustering

PDF Full Text Request

Related items

1	Research On Word Sense Disambiguation Method Based On Word Embedding
2	Construction Method Of Sense Embedding Based On Semantic Graph Clustering
3	Structured Auto-encoder Based On Deep Clustering Algorithm Analysis
4	Research On Chinese Word Sense Disambiguation Method Based On Graph Model
5	Context Computing Applications, Word Disambiguation
6	Domain Entity Disambiguation And Link Prediction Based On Representation Learning
7	Research And Application Of Locally Enhanced Attribute Network Embedding Via Deep Auto-encoder
8	Research On Image Representation Via Multi-Graph Embedding
9	Study Of Statistical Process Monitoring Method Based On Auto-Encoder
10	Research On Chinese Person Name Disambiguation Based On Knowledge Graph Embedding