Font Size: a A A

The Design And Implementation Of Scholar Research Interest Discovery System Based On Topic Model

Posted on:2021-07-08Degree:MasterType:Thesis
Country:ChinaCandidate:J L ZhangFull Text:PDF
GTID:2518306557494154Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The construction of the industry-university-research data platform promotes the exchanges and cooperation between enterprises and academics from universities.The research interests of scholars in the platform are important references for enterprises and other users to know scholars and cooperate with them.A timely and comprehensive description of scholars' research interests can not only provide the platform with a basis for preliminary selection of scholars,but also enrich the research interest labels of scholars' portraits,recommend corresponding companies according to the research interest labels,and provide data support for the division of scholar communities,and increase the basis for judging the retrieval results.However,timely and comprehensive update and analysis of scholars' research interests greatly increases the workload of platform workers,and puts forward higher requirements for their relevant professional knowledge.Therefore,in order to improve the maintenance efficiency of operation and maintenance personnel,and then improve the efficiency of platform operation,constructing a data extraction tool that efficiently obtains scholars' research interests and comprehensively displays scholars' research interests has extremely high application value.This thesis designs and implements the research interest discovery system for scholars,which encapsulates the work flow required for the research interest discovery of scholars,and helps operation and maintenance personnel to efficiently complete the work of maintaining relevant data without knowing the data processing technology and professional knowledge.The system uses topic models to extract literature data to obtain scholars' research interests,and expands functions such as the evolution of research interests and recommendation of similar scholars,and finally completes data visualization in the form of Web.Under the premise of improving the work efficiency of staff,ordinary users also can understand scholars more intuitively.The main work of this thesis is as follows:(1)Based on the ETM topic model,the input optimization strategy of ETM model based on data preprocessing is proposed.The crawler is used to collect the required data from the network,and a pseudo long text aggregation strategy based on similar documents is proposed.The pseudo long text is expanded by using similar documents,and the similar documents are combined with the original documents as input documents.At the same time,the topic extraction of scientific and technological literature based on terms is proposed,and the function of term recognition model and special stop words extraction is realized Professional terms and special stop words are introduced as prior knowledge in topic modeling.The experimental results show that the model based on the optimization strategy has greatly improved the evaluation indexes such as the consistency of the topic and the interpretability of the results.(2)Based on the ETM optimization strategy,an ECK model for research interest extraction is proposed.Firstly,select a larger number of topics to perform ETM modeling according to the degree of perplexity to complete fine-grained topic extraction;then,take the processed topic word distribution as input,and automatically obtain the number of scholars' research interest clusters through the Canopy algorithm;finally,use K-Means algorithm clusters research interest clusters,and realized the correspondence between the themes of the literature and the research interest,in order to obtain the research direction of the literature.Through in-depth analysis of the result data,the distribution of scholars' research interest,the evolution of research interest and the recommendation of similar scholars are obtained,and the comprehensive characterization of scholars' research interest is completed.The experimental results show that the ECK model improves the accuracy of literature research direction,thereby ensuring the accuracy of scholars' research interest data.(3)According to the idea of software engineering,this thesis designs and implements a scholar's research interest discovery system,which includes data acquisition module,Data preprocessing module,research interest extraction module and data visualization module.The test results verify the availability and effectiveness of the system,and the system can achieve the expected research objectives.
Keywords/Search Tags:topic model, research interest, clustering algorithm, term recognition, vertical crawler, software engineering
PDF Full Text Request
Related items