Font Size: a A A

Study On The Key Technology Of Scholarly User Profile Based On Multi-Source And Heterogeneous Big Data

Posted on:2019-07-02Degree:MasterType:Thesis
Country:ChinaCandidate:A Z WenFull Text:PDF
GTID:2428330566487226Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the scientific research,a lot of information on scholarly user attributes and behaviors have been accumulated.It provides more data basis for the construction of user profile and brings higher challenges.Firstly,this paper introduces the background and current status of the related studies,then deeply analyzes and summarizes the related technologies in the research field of scholarly user profile.Secondly,this paper divides the model of scholarly user profile into three modules,such as attributes extraction of basic profile,discovery of scholars' interest labels,and prediction of future academic impact.Combined with relevant technologies,the corresponding models are proposed and experimentally evaluated.Finally,the above models are implemented based on distributed storage and parallel computing frameworks.This paper builds a prototype system of scholarly user profile based on multi-source heterogeneous data.The research work in this paper includes the following respects.(1)A profile attributes extraction model(PAE-NN)based on bidirectional long-short-term-memory networks and Conditional Random Field is proposed.Compared to previous studies based on the CRF model,the model automatically extracts the character-based and contextual features through a deep neural network.It realizes the end-to-end training.At the same time,it effectively solves the problem of long-term dependence between extracting entities and further improves the extraction accuracy of scholars'basic attributes.(2)A multi-label classification model(LDANE)for scholars' interest labels is built by integrating text semantic information and scholarly network information.Existing works generally utilize text mining or label propagation methods to solve this problem,but not both.The model integrates a variety of relevant text semantic information of scholars into a unified topic model framework.At the same time,a large-scale network representation learning method is used to extract the features of the heterogeneous scholarly network.Finally,the Stacking ensemble learning method is introduced to improve the classification accuracy.(3)A prediction model of future academic impact(XG-RWTA)is built.Combined with the classified filtering.,the model can improve the prediction efficiency in the dataset satisfying the long tail distribution.Considering the time of publication and the order of authorship,a measurement method of scholar impact(RWTA)based on the heterogeneous scholarly network is defined,and this feature is integrated into the prediction model.The model solves the prediction problem of long-term future time and further improves the accuracy of prediction.(4)The above user profile models are implemented based on the big data processing frameworks,such as Hadoop,Spark,and TensorFlow.Combined with the related technologies of distributed storage and parallel computing,an architecture based on multi-source heterogeneous data fusion is designed,and the prototype system of scholarly user profile based on big data is built.
Keywords/Search Tags:Scholarly Big Data, User Profile, Interest Labels, Academic Impact, Distributed Storage and Computing
PDF Full Text Request
Related items