Font Size: a A A

The Research On Academic Paper Author Name Disambiguation

Posted on:2017-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:S QiuFull Text:PDF
GTID:2348330485481326Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Recent academic research activities are more and more dependent on digital libraries like DBLP However,most digital libraries cannot solve the problem on incorrect results searching by author names,which is caused by author name disambiguation.In this thesis,we first formally defined the key concepts of the author name disambiguation.According to the research on traditional disambiguation methods,the hierarchical clustering algorithm based high confidence features was proposed.Furthermore,an author related topic model based on academic paper semantics was proposed.The main research contents are as follows.In order to test the pros and cons and the application scope of traditional methods,and provide data for further research work,we first constructed the test data set and proposed its evaluation criteria.Then we studied naming rules in different language families and proposed an approach of constructing ambiguous groups to solve the synonymy problem of author name ambiguous.Different algorithms for different features were respectively implemented and the rank of confidence for each feature was obtained from them.Based on results above,a hierarchical clustering method based on high confidence features was designed and implemented.In our method,different similarity functions were selected in accordance with different features.Also,the processing for each round of clustering merges several clusters using heuristic rules.In comparison with traditional hierarchical clustering methods,our method can get higher average precision and recall,which is respectively increased by 10.7%and 2.9%.Also,it obtains higher efficiency.According to the problem that most of traditional semantic disambiguation methods ignore the distribution of paper topics,we proposed a novel author related topic model based on academic paper semantics.In our model,we first train test data set and construct a topic tree.Then the paper set was mapped to the corresponding topics and the corresponding topic trees were generated.Finally,academic papers were clustered by calculating the similarity of its corresponding topics in their topic trees.Our experiments demonstrate that academic paper semantics can be mined effectively if the topic distribution factor is considered.Therefore,our method acquires better performance on author name disambiguation work.Finally in order to validate the validity of our method,we apply our author name disambiguation method in TLDW system.Through analyzing the search results,we find that our method can effectively decrease the author name disambiguation and obtain more accurate results.
Keywords/Search Tags:Author Name Disambiguation, Hierarchical Clustering, Heuristic Rules, Semantic Extraction, Topic Tree
PDF Full Text Request
Related items