Font Size: a A A

Hot Topic Detection And Topic Evolution Analysis In Social Media

Posted on:2017-02-27Degree:MasterType:Thesis
Country:ChinaCandidate:J WangFull Text:PDF
GTID:2308330503983636Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Social media(such as Microblogging, BBS, etc.) has become an important platform for information interaction with the rapid development of Internet and Web 2.0 technology. Formally, social media is a kind of web application which is based on the Web 2.0 technology and allows users to publish contents and communicate with each other. With the popularity of mobile devices, more and more people are willing to express and share their opinions in social media. It is reported that the user generated data in social media has reached TB level every day. Thus, social media has become a valuable data source for public emergency detection, public sentiment analysis, and public opinion monitoring.Over 80% of the contents in social media are still textual data, so the text mining has been an advanced research focus for contents analysis in social media. For text mining, topic model has been proved to be a very effective means. The object of topic model is to obtain the “document-topic” distribution and “topic-word” distribution according to the known words distribution of a document. An appropriate topic model can not only successfully detect the latent topics, but also can be further applied in the fields such as text classification, hot topic extraction and information organization. In the past few years, a few conventional and classical topic models(such as PLSA, LDA) have been proposed and demonstrated success in mining topics for a diverse range of document genres. However, for the data in social media like tweets, they fail to identify high quality underlying topics because of its short and informal features. Therefore, in recent years, many efforts have attempted to improve the traditional topic models(especially the LDA model) to cope with text features in social media. In this paper, we firstly explore and summarize the improvement methods based on LDA. Inspired by these excellent practices, we propose to utilize some attributes in microblogs(such as hashtags, timestamps, etc.) to improve the LDA, and then further apply the new improved methods in hot topics detection and topic evolution analysis in social media.As for hot topic detection, we develop a new topic model named Multi-Attribute Latent Dirichlet Allocation(MA-LDA), in which the time and hashtag attributes of microblogs are incorporated into LDA model. Along with the time attribute, MA-LDA model can decide whether a word should appear as hot topics or not. Meanwhile, compared with the traditional LDA model, applying hashtag attribute in MA-LDA model gives the core words an artificially high ranking in results means the expressiveness of outcomes can be improved. Empirical evaluations on real data sets demonstrate that our method is able to detect hot topics more accurately and efficiently compared with several baselines. Our method provides strong evidence of the importance of the temporal factor in extracting hot topics.For further work on topic evolution over time,this paper introduces a novel topic model to detect and track the evolution of content in social media by integrating hashtag and time information. Specifically, we develop two methods to cope with different functions of hashtags. The first one is named hashtag-generated Topic Over Time(hg-TOT), in which a document is generated by existing words and hashtags as a whole. In addition, the hashtags can also be used as weakly-supervised information when sampling a topic. To enhance the significant impact of hashtags via topic variables, we further develop another model named hashtag-supervised Topic Over Time(hs-TOT). Compared with LDA, our methods can also compute two additional posterior distribution, namely “topic-hashtag” distribution and “topic-timestamp” distribution. Experiments on real data show that both hg-TOT and hs-TOT could detect and track meaningful contents and topics successfully.
Keywords/Search Tags:topic model, LDA, hot topic, topic evolution, social media
PDF Full Text Request
Related items