Font Size: a A A

Real-time Temporal-spatio Trend Detection Of Bursty Topic On Microblog

Posted on:2019-12-30Degree:MasterType:Thesis
Country:ChinaCandidate:T ZhangFull Text:PDF
GTID:2428330623950805Subject:Software engineering
Abstract/Summary:PDF Full Text Request
On-line social network has become one of the most vital channels for people to be informed when something busts out.Besides,as the convenience of immediately posting what they want to say,netizen increasingly participate in the interaction on social network,and social network is usually deluged with burst events happened in real world.Therefore,the real-time detection of burst topics on microblog has acquired much research efforts in recent years,due to its wide use in a range of user-focused tasks such as information recommendation,trend analysis,and document search.Our target is to further the study on the latest early burst topic temporal-spatio detection for higher effective and efficient results.Firstly,we proposed a refined version of TopicSketch,an up-to-date and efficient detection method using tensor decomposition and dimension reduction for real-time burst topic detection.Our main improvement is the word intrusion and topic coherence,making use of clustering and fuzzy set theory jointly to facilitate the process of extracting informative and interpretable burst topics and their burst scores.Secondly,we proposed a novel topic coherence measure,sketch-based PMI method to estimate topic coherence based on PMI among topic words.We take the Word-Sketch statistics for PMI reference corpus,in which words are dynamically sampled over consecutive sliding window on real-time data stream,and fresh word probabilities feeding into PMI,gives estimation of topic coherence much more reasonable and precise.Thirdly,we designed a novel busty topic spatio detection framework.Assuming all the online topics occur on the countable space consist of two-tuples(province,city).Then we can estimate sociality for each two-tuple based on series of social features.Next,the distance for information diffusion can be quantified on the two-tuples space by the weight of sociality.Specially,the province with higher sociality contributes less on diffusion distance,which helps to highlight topic that occurs on less sociality area or on almost all over the country.At last,we evaluated our methods over 7 million Sina microblog stream.The experiment results demonstrate both efficiency in topic detection with temporal and spatio burst scores and effectiveness in topic interpretability.Besides,the spatio burst score of each detected topic helps a lot to extract the spatio special topics.Specifically,our method on a single machine can consistently handle millions of microblogs per day with memory consumption below 70% and present ranked interpretable topics with different temporal and spatio burst scores.
Keywords/Search Tags:Burst topic temporal-spatio detection, Topic Interpretability, Sociality, Burst scores
PDF Full Text Request
Related items