Font Size: a A A

Research And Application On Big Scholarly Data-based Key Technique Of Academic Search System

Posted on:2021-03-26Degree:MasterType:Thesis
Country:ChinaCandidate:Z M WuFull Text:PDF
GTID:2428330611965582Subject:Computer technology
Abstract/Summary:
In the context of the rapid development of current scientific research work,researchers increasingly rely on the use of academic search systems to assist in various academic research.However,with the continuous expansion of academic data resources on the Internet and the increasing demand of people,academic search system is facing greater challenges.Name disambiguation and academic text matching are the two key technologies of the academic search system.The former is to ensure the accuracy of the academic data,the latter is the most commonly used function of scientific researchers,and it is also the basis of other upper-level services.The existing name disambiguation methods have the following problems: 1)The effective information features have not been fully exploited;2)Isomorphic algorithm cannot extract heterogeneous feature relations;3)Disambiguation accuracy is not good enough,and it is easy to produce an outlier paper collection.At the same time,the matching method based on keyword retrieval in traditional academic search systems is only suitable for standard document retrieval,ignoring users' demand for semantic retrieval.In view of the above problems,this paper makes an in-depth study on name disambiguation and academic text matching.The main work and contributions of this paper are made as follows:(1)A algorithm based on meta-path random walk is proposed for name disambiguation from scratch.Compared with the previous model,our algorithm fully explores the node associations in heterogeneous academic networks from the perspective of different document attributes by using the idea of meta-path guidance.After using the unsupervised clustering algorithm,the rule re-clustering method is used to merge outlier papers,which further improve the accuracy of disambiguation.Compared with other algorithms,the average accuracy of disambiguation increases by 28%;(2)A algorithm that combines multi-class fine-grained features is proposed for continuous name disambiguation.Unlike previous studies that used co-authoring relationship or a small number of attribute features,the algorithm proposed in this paper performs fine-grained feature engineering on document attributes,in which a triplet loss network is constructed to extract embedding features of text.Finally,stacking ensemble method is used to further enhance the effect of disambiguation,which is 7.2% higher than other optimal algorithm models;(3)According to the characteristics of academic text,the PSBERT model for academic text matching is proposed.Firstly,the corresponding text representation vector is obtained through the representation layer.Then PSBERT model realizes the learning of explicit interactive features and implicit interactive features of the text through the interactive layer and the deep network layer.Finally,compared with other methods on the Chinese data set and English data set,the results show that the corresponding evaluation indicators are 2.1% and 5.9% higher than other optimal algorithms,respectively,which proves the effectiveness of the model in this paper;(4)The above algorithm models are implemented based on deep learning and information retrieval frameworks,such as Tensorflow and Elasticsearch.At the same time,combined with the previous research work of our research group,an academic search prototype system with semantic retrieval function is realized.
Keywords/Search Tags:Big Scholarly Data, Heterogeneous Network, Name Disambiguation, Text Matching
Related items