Font Size: a A A

Design And Implementation Of Technology News Analysis System Based On Topic Model

Posted on:2020-04-16Degree:MasterType:Thesis
Country:ChinaCandidate:T Y ZhangFull Text:PDF
GTID:2428330575957091Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Since the 21st century,the technological level of human society has been developed rapidly.Under this background,a large number of technological literature and technology news have been published.On the one hand,'these resources brought a wealth of data and information,on the other hand,it also brings the problem of information overload.While facing the huge knowledge network on the Internet,it is difficult for researchers to efficiently obtain the valuable technology hotspot information from the technology news website.In recent years,data mining has gradually become a popular subject,and many natural language processing technologies have been applied to the field of news analysis and achieved good results.Therefore,in order to solve the problem of technology news information overload,this thesis studies the techniques of topic clustering and keyword extraction,and implements a complete technology news analysis system to help users obtain news information efficiently.The main work of this thesis is as follows:(1)An improved Biterm Topic Model has been proposed in this thesis.By introducing the degree of association between words and documents in the Gibbs sampling process,the problem of the same weight of words in the traditional BTM model is solved.Compared with similar models,the experimental results show that our model has been improved in terms of theme consistency and JS divergence.(2)Because the traditional keyword extraction algorithm does not consider the relationship between keywords and article topics,this thesis combines the topic model with the word vector model,and comprehensively considers the topic features and statistical features of the words to extract keywords.Because the topic layer is added between the words and the article,the keywords and articles extracted by the method have better semantic relevance.(3)A complete technology news analysis system has been designed and implemented in this thesis.The system can cluster technology news based on the theme and extract keywords corresponding to the theme,so that users can quickly and intuitively understand the current focus of technology news,and then more accurately obtain the information of interest.
Keywords/Search Tags:text mining, biterm topic model, word2vec, keyword extraction
PDF Full Text Request
Related items