Font Size: a A A

Research Hotspot Situation Analysis Based On Topic Model

Posted on:2019-06-10Degree:MasterType:Thesis
Country:ChinaCandidate:L L WuFull Text:PDF
GTID:2348330569495552Subject:Engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid growth of the number of scientific research workers,the number of academic literature is increasing.This phenomenon makes the development trend of scientific research hotspots not be artificially tracked and processed.The development trend of research hotspots is beneficial for the researcher to find the related topic material,to help the scholars to understand the development of the topic of scientific research in time,to be beneficial to the investment decision of scientific research and the national guidance and encouragement of scientific research work.In the past,some researchers expressed hot topics by means of simple statistical methods,which not only ignored the similarity between words but also consumed a lot of human and time cost.At present,the main research of hot research are Chinese academic literature,but many important scientific research results are English.In order to better master and track the development of the current research hotspots,this paper deals with the data processing and analysis of the SCI academic literature: cleaning and denoising trenhe information,text segmentation,removing stop words,root reduction and so on.Then using word2 vec and LDA's analysis technology of topic model deal with the data,extract hot topics of scientific research and their thesaurus.Finally present the results in a visual way.The main research work of this paper is as follows:1)Analyze the experimental data by the techonology based on the word2 vec model and the topic model of LDA.This paper improves the LDA topic model,introduces the word2 vec word vector representation,turns the topic-word matrix of the traditional LDA model into the topicword vector matrix,makes up for the lack of contextual semantic information and measures the similarity of the text data.It solves the optimal number of topics,the problem of optimal number of topics is converted to the statistical problem,the optimal number of topics is quantitatively analyzed,and the optimal number of topics is calculated by F statistic.This paper compares the traditional LDA model with the model of the thesis through the perplexity index,respectively,measures the development of topics in three forms: topic intensity,topic similarity and topic stability.2)Visual analysis of scientific research hotspots.This paper uses three kinds of methods to visualize the development of scientific research hotspots,including static visualization: Word cloud,dynamic visualization: ThemeRiver and TIARA(text insight via automated responsive analytics).Based on the topic model of word2 vec and LDA,it can make up for the lack of the similarity between the words and the words.The extraction of topics and keywords is more reasonable,and the thesis uses the visualization method to express the research hotspots.This can facilitate user to find a period of scientific research hotspots and the development trend of a scientific research hotspot,and so on,to facilitate scientific research workers to grasp the hot research,national support and guidance for scientific research work.
Keywords/Search Tags:word2vec model, LDA topic model, text visualization, research hotspot
PDF Full Text Request
Related items