Font Size: a A A

Research On Hot Topics With Topic Model

Posted on:2021-01-21Degree:MasterType:Thesis
Country:ChinaCandidate:S Y ChengFull Text:PDF
GTID:2428330629988460Subject:Software engineering
Abstract/Summary:PDF Full Text Request
News is a report of recent facts,covering everything that has happened in society.Research on news texts plays a very important role in people's attention to social hotspot research,economic situation research,and social development status research.News reporting is also An important source of information for people to obtain social information.In recent years,with the rise of online platforms and the emergence of large amounts of selfmedia,the number of online news has exploded.In the face of a large number of intricate news,it becomes more and more difficult for people to obtain news hotspots from it.Mining news hot spots from the data has also become a research focus.Many existing topic models are aimed at the text,and the topic mining is carried out from the news content.This way can also get a good news topic,but when facing online news,it will ignore the immediacy and interactivity of online news.Characteristics,the topics that are mined are also susceptible to word frequency,which leads to the inaccurate topics that are mined.Therefore,the application of this method is not suitable for the topic mining of online news.This article uses the characteristics of online news to improve the traditional theme model to carry out topic mining.The instantaneousness and interactivity of online news make online news will cause many people to comment and express their opinions once they are released,and participate in discussions with others.Therefore,when the number of commenters is greater,the larger the amount of comments,the more attention is paid to the news as a measure of news popularity.This article believes that hot news is more likely to generate news hot spots.Based on this view,the hot news and general Distinguishing news can also reduce the impact of irrelevant news,and can more accurately dig out current news hot spots.This article will first measure the popularity of the news,calculate the popularity of each news,let the different news in the data set be differentiated according to the popularity,reduce the impact of irrelevant information,and then use the Text Rank algorithm to sort the importance of each word in the news text.Calculate the importance of each word in the news,and then combine the news to carry out topic mining,which can improve the influence of the word frequency,and integrate the characteristics of online news into a topic mining model about heat.Considering that traditional topic models such as the LDA(Latent Dirichlet Allocation)model are based on the bag-of-words model,the relationship between words and words is not considered,and the context semantic information is lost.The current common solution is to obtain the word vector and the topic model.The topic distribution is combined,and then the word vector is averaged to calculate the topic vector,but the word vector and topic distribution of such methods are not trained in the same semantic space,which makes the model less interpretable.For this,this paper proposes LFH-LDA model and Doc2 vec modelbased LFH-LDA model for topic mining,by combining the Doc2 vec model training word vectors to make up for the lack of context semantics,also using the LF-LDA model can make the word vector and LDA model in the same model Training topic vectors reduces the loss of information,so that news hot spots can be more accurately mined.In this paper,the method proposed in this paper is verified by the real news data of Sina News.The topic similarity,perplexity indicators and result analysis are compared with the traditional LDA model.The feasibility and effectiveness of this method are verified through experiments.The later model has a better model fit and improves the quality of the theme.
Keywords/Search Tags:LDA, Heat, Text Rank, Doc2vec
PDF Full Text Request
Related items