Font Size: a A A

Research And Implementation Of The Disambiguation Method With The Same Name In The Expert Database

Posted on:2021-03-07Degree:MasterType:Thesis
Country:ChinaCandidate:M JiangFull Text:PDF
GTID:2428330611465678Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of network technology,the Internet has become a huge source of information.Using the relevant information of science and technology experts scattered in various science and technology information systems,the web-based science and technology expert library can be constructed to provide search,selection and other kinds of topic-oriented science and technology services.However,experts from different information systems have duplication and different data quality.The ambiguity of expert name not only reduces the accuracy of expert retrieval,but also has a serious impact on subsequent analysis.Therefore,experts collected from Internet can be saved into the scientific and technological expert database only after disambiguation with the same name was carried out.Disambiguation with the same name distinguishes ambiguous and confused real individuals.According to the requirements of the research group,this thesis takes the authors of scientific papers in the scientific and technical expert library as the research object.Using the complex collaborator relationships in the scientific papers,combined with the characteristics of the research field of the researchers,this thesis proposes the same name disambiguation based on the collaborator relationship algorithm and step-by-step disambiguation algorithm based on the relationship of trusted collaborators.The main works of the thesis are as follows:(1)A name disambiguation algorithm based on the relationship of collaborators is proposed.The traditional disambiguation method based on common attributes cannot accurately measure the similarity of authors,and the collaborator feature is a strong feature that characterizes the author.Based on this,the algorithm first establishes a collaborator association graph based on the collaborator relationship,and then uses the multi-path characteristics of the graph to calculate the similarity of the authors of the same name on the collaborator characteristics.Considering that the characteristics of a single collaborator cannot disambiguate well for authors of the same name with fewer authors and fewer collaborators,this thesis designed a method for calculating the similarity of domain features based on scientific and technological terms to further improve the disambiguation effect.The method first recognizes the domain information in the paper based on the scientific and technological entries and calculates the relevance,then establishes the domain feature model based on the scientific and technological entry tree graph model,and improves the integrity of the author's domain information through the domain node expansion method,and then calculates the author of the same name in Similarity in domain features;Finally,according to the collaborator feature similarity and domain feature similarity,the authors of the same name with the highest similarity are merged using the hierarchical clustering method.(2)A stepwise disambiguation algorithm based on the relationship of trusted collaborators is designed to solve the problem of collaborators with the same name.The algorithm uses twostage disambiguation.The algorithm first gives the definition and judgment method of the relationship of trusted collaborators.Clustering and merging authors of the same name who meet the inference complete the first stage of disambiguation;the second stage is in the first Based on the phase disambiguation,the bipartite graph of the collaborator is first constructed according to the relationship of the collaborators,and then the Sim Rank algorithm is used to calculate the similarity of the authors of the same name on the characteristics of the collaborators using the overall characteristics of the bipartite graph of the collaborators.Feature similarity calculates the comprehensive similarity,and merges the authors with the same similarity with the highest similarity through the hierarchical clustering method,so as to complete the second-phase disambiguation of the same name.
Keywords/Search Tags:Name Disambiguation, Partner Relationship, Similarity Calculation, Hierarchical Clustering
PDF Full Text Request
Related items