Font Size: a A A

Microblog Oriented Hot Topic Detection And Tracking

Posted on:2015-04-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y X SunFull Text:PDF
GTID:2298330422483508Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Microblog is a new form of network media, which has attracted high attention inthe whole society. The information published by users usually contains many recenthot topics. Microblog hot topic not only affects the development of various events invirtual network society, but also affects people’s judgment on the view of events inreality.So microblog hot topic detection and tracking is meaningful on grasping thetendency of public interest. Thus, microblog hot topic detection and tracking hasaroused researchers’ attention in recent years. In this paper, a new approach isproposed. Research main content is as follows:(1) The conservative approach is taken to extract useful field of microblog, wordsegmentation, splitting on whitespaces and punctuation mark, and eliminating stopwords to obtain pure data.(2) In order to face the challenges of feature sparsity of short text messages formicroblog hot topic detection, in this paper, we first explore the relation betweenterms, and then build term correlation matrix which is much denser thanterm-document matrix. Symmetric non-negative matrix factorization (SNMF) on termcorrelation matrix is applied to obtain term-topic matrix. Finally, we formulated thetopic learning problem as probabilistic Latent semantic analysis (pLSA) on term-topicmatrix. Besides, this paper also presents the definition of heat and mechanism ofsorting the topics. Experiments show that our method can effectively cluster topicsand be applied to microblog hot topic detection.(3) To overcome the disadvantage ignoring relation between terms and topicdrifting problem of traditional method during hot topic tracking, we propose aself-adaption hot topic tracking method combining with terms relation. We first makepresort on dataset to reduce the number of microblog data, and then find out the termsmutual information in the same document and the information in the differentdocuments. The conventional text representation model is updated using the relationbetween terms and judge whether it is the subsequent discussions of the hot topic bysimilarity calculation. Finally update the vectors of hot topics of microblogs to avoidtopic drifting problem.We focus on the three points above on microblog hot topic detection and tracking.Experiments show effectiveness of our method.
Keywords/Search Tags:Microblog, Self-adaption, Hot topic detection, Hot topic tracking
PDF Full Text Request
Related items