Font Size: a A A

Potential Scholars Mining Based On Big Scholarly Data

Posted on:2021-02-16Degree:MasterType:Thesis
Country:ChinaCandidate:J Y ZhangFull Text:PDF
GTID:2428330629988922Subject:Engineering
Abstract/Summary:PDF Full Text Request
The arrival of the big data era has accelerated the rapid development of various industries,and then produced a large number of data related to academic research.As a result,big scholarly data came into being.Scholars have carried out a great deal of research on the basis of big scholarly data.However,few people predict the future scholars by studying their academic performance at the beginning of their academic career.Consequently,we propose the idea of the potential scholars mining in this paper.It is of great significance to carry out this research,which can be used as references for evaluating the potential academic ability of researchers.This paper predicts the future academic performance of the scholar through the data at the beginning of his academic career.Therefore,this work is mainly divided into the following two aspects:(1)Combined feature construction based on GBDT.This paper takes the time when the scholar published the paper as the first or second author as the academic career starting time,and intercepts the following 5 years data as judgment.The 16 features are selected as the evaluation basis,and the data set of scholars on the AMiner is used as the basic data set.The missing data is supplemented through the MAG,and the optimal positive and negative sample ratio of 1:2 is determined through experiments.The processed academic data is used to train the GBDT model,record the positions where the samples finally fall on the tree leaf nodes in the GBDT model,and construct a new feature vector by one-hot coding.(2)The establishment of the potential mining model.The above obtained feature vectors are used as the input of Logistic regression to determine the optimal model parameters,and the precision of the potential mining model reaches 80.3% through experiments.Through the comparative experiments,it is shown that the proposed potential mining model is superior to the logistic regression model,GBDT and random forest from the precision,recall and f-score,which shows the effectiveness and accuracy of the proposed scheme.
Keywords/Search Tags:Big scholarly data, Potential scholars, GBDT, Logistic regression, Random forest
PDF Full Text Request
Related items