A Research On Scholarly Data Mining Techniques

Posted on:2019-09-10

Degree:Master

Type:Thesis

Country:China

Candidate:J Xu

Full Text:PDF

GTID:2428330611993432

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The global academic literature has exceeded 300 million.The number of academic workers has reached 100 million,and is increasing 13% to 20% per year.However,only about 3% of the data in these narratives contain semantic annotation information.Lacking of semantics has greatly affected the efficiency of academic research.The greater the number of academic literature,the more difficult it is for researchers to check and calibrate,and academic research will be duplicative and wasteful.This paper mainly studies scholars' information mining,and focuses on the two key tasks of name disambiguation and interest tag discovery.Realizing the disparity of author names can improve the accuracy of computational authors' influence on shadows.The realization of interest tag discovery can improve the retrieval speed,recommend relevant scholars and articles for users,and reduce the cost of obtaining information.Effectively dealing with these two problems can solve the problem that the current academic search engine retrieval speed is slow and the retrieval content is incomplete.Name disambiguation task is to distinguish scholars with the same name but not the same person in the academic literature.This paper proposes a author name disambiguation model based on network representation learning.This article is based on the three most popular data sets currently available,with a 5%-10% increase in Macro-F1 values compared to the currently best-performing methods.The model in this article transforms attributes into relationships between network nodes,so it can be easily extended to other multi-attribute networks.Interest tag discovery task is to tag the interest according to each author's article information.This paper proposes a scholar interest tag discovery model based on network representation learning.This paper conducts experiments on the largest artificial marker Aminer dataset,which has a 2.7% improvement over the current best-performing method.The model of this paper combines the advantages of probability and statistical models and deep learning models,and can effectively capture the global relationship between nodes and the semantic information inside the nodes.

Keywords/Search Tags:

Name Disambiguation, Scholar Interest Discovery, Network Embedding, Data Mining

PDF Full Text Request

Related items

1	Scholar Interest Labels Mining Based On Network And Text Information
2	Research On Scholar Disambiguation Based On Heterogeneous Information Networks And Fine-Grained Features
3	Design And Implementation Of Scholar Search And Scholar Community Discovery System
4	The Design And Implementation Of Scholar Research Interest Discovery System Based On Topic Model
5	Based On Knowledge Discovery In Time Series Data Mining Algorithm
6	A Study On Methods Of Author Name Disambiguation In Academic Literature
7	Efficiency Improvement Of Mining The Region Of Interest On Moving Object
8	Spatio-temporal Analysis And User Interest Mining Based On Cellular Network Data
9	Evolution Study Of Outstanding Scholar Impact Based On Scholarly Big Data
10	Scholar Resume Automatic Generation Based On Text Mining