Font Size: a A A

A Research On Scholarly Data Mining Techniques

Posted on:2019-09-10Degree:MasterType:Thesis
Country:ChinaCandidate:J XuFull Text:PDF
GTID:2428330611993432Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The global academic literature has exceeded 300 million.The number of academic workers has reached 100 million,and is increasing 13% to 20% per year.However,only about 3% of the data in these narratives contain semantic annotation information.Lacking of semantics has greatly affected the efficiency of academic research.The greater the number of academic literature,the more difficult it is for researchers to check and calibrate,and academic research will be duplicative and wasteful.This paper mainly studies scholars' information mining,and focuses on the two key tasks of name disambiguation and interest tag discovery.Realizing the disparity of author names can improve the accuracy of computational authors' influence on shadows.The realization of interest tag discovery can improve the retrieval speed,recommend relevant scholars and articles for users,and reduce the cost of obtaining information.Effectively dealing with these two problems can solve the problem that the current academic search engine retrieval speed is slow and the retrieval content is incomplete.Name disambiguation task is to distinguish scholars with the same name but not the same person in the academic literature.This paper proposes a author name disambiguation model based on network representation learning.This article is based on the three most popular data sets currently available,with a 5%-10% increase in Macro-F1 values compared to the currently best-performing methods.The model in this article transforms attributes into relationships between network nodes,so it can be easily extended to other multi-attribute networks.Interest tag discovery task is to tag the interest according to each author's article information.This paper proposes a scholar interest tag discovery model based on network representation learning.This paper conducts experiments on the largest artificial marker Aminer dataset,which has a 2.7% improvement over the current best-performing method.The model of this paper combines the advantages of probability and statistical models and deep learning models,and can effectively capture the global relationship between nodes and the semantic information inside the nodes.
Keywords/Search Tags:Name Disambiguation, Scholar Interest Discovery, Network Embedding, Data Mining
PDF Full Text Request
Related items