Font Size: a A A

Detection And Tracking Of Micro-blog Hot Topics Based On Time Series

Posted on:2018-10-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y DingFull Text:PDF
GTID:2348330533962724Subject:Software engineering
Abstract/Summary:PDF Full Text Request
At present,Micro-blog has become an important platform for information sharing and dissemination,which generates public opinion on Internetthen impacts on real world.In order to maintain normal social order,detection and tracking Micro-blog hot topics is particularly important.First,the existing Micro-blog hot topics detection and tracking methods are studied:methodbased on Vector Space Model(VSM)and methodbased on Latent Dirichlet Allocation(LDA).Hot topics detection methods such as K-means and topics trackingmethods such as decision tree algorithm are summarized and analyzed.It is found that the calculations of traditional Micro-blog hot topics detection and tracking method based on VSM are very complex and the resultsare not accurate and detailed enough.So LDA model is mainly studied,and three kinds of method for combining LDA model and time series in detection and tracking Micro-blog hot topics are analyzed:discrete timebefore detection topics,discrete time aftertopics detectionanddetection and tracking topics over time.It is found that the methods of discrete time after topics detection and detection and tracking topics over time can only track intensity of Micro-blog hot topics,and they cannot track the changes in the content of Micro-blog hot topics,but the discrete time before detection topics can track both intensity and the changes in the content through calculation of topic relevance degree.LDA model of the discrete time before detection topics in Micro-blog hot topics detection and tracking need calculate relevance of topics.It is found that the classical Kullback Leibler Divergence(KL)algorithm and its improved algorithm for calculation of topic relevance degree have defects.For example,KL algorithm don't consider the similarity of feature words of micro-blog hot topicsand the change of Micro-blog hot topics content with time.To solve this problem,Jaccard-Word co-occurrence(JW)algorithm is proposed to calculate the relevance degree of Micro-blog hot topics based on the similarity degree of feature words and the feature words co-occurrence.The probability of having same contents between two topics is measured by similarity of feature words,and the probability of content relevance between two topics is measured by co-occurrence of feature words.Experiments were performed on two data sets to verify effectiveness of the JW algorithm;it is proved that the recall rate,accuracy rate and F1 value of the JW algorithm are higher than classical KL algorithm and JSD-Cosine algorithm.Comparing the changes in the intensity and the changes in content of the hot topics in the time series with the development of real events,it is proved that the results of detection and tracking are in line with the development of real events,indicating that the JW algorithm is feasible and effective.
Keywords/Search Tags:Micro-blog, Time, Topic Detection, Topic Tracking, Relevance calculation
PDF Full Text Request
Related items