Font Size: a A A

Bursty Event Detection In Microblog Based On Burst Words Regional Analysis

Posted on:2018-05-28Degree:MasterType:Thesis
Country:ChinaCandidate:X B ZhangFull Text:PDF
GTID:2348330542464527Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Facing with the massive micro-blog data,to detect the micro-blog bursty events timely and accurately is an important significance for the network public opinion monitoring.A kind of rule be found by the research and analysis of propagation rule of the microblog bursty events,which was the regional coverage of microblog bursty event related microblog documents began to gradually expand,then to the extreme and finally gradually narrowing along with the evolution of events,according to this rule,a method of microblog bursty events detection based on burst words regional analysis was proposed in this paper.A two dimensions which was regional property and sentiment property of burst words was used to identify the microblog bursty events.Firstly,the positive of documents were filtered by emotional calculating.Then,according to the degree of regional diffusion of the feature words to detect burst words from the remaining documents.Finally,a new bursty event detection method was introduced to cluster the burst word set for finding the microblog bursty events.The experimental results show that comparing with two comparative literature,our method of microblog bursty events detection in the precision,recall and F average has improved obviously.Under the situation of big data,in order to detect microblog bursty events from the vast amounts of microblog documents real-timely and efficiently,a parallel model of the microblog bursty events detection method based on spark was proposed.This parallel model was mainly contain four steps to realize,which was the parallel of noise filtering,the parallel of regional burst detection of feature words,the parallel of construction co-occurrence matrix,the parallel of bursty event detection and follow.Firstly,to achieve the parallel filtering of noise documents,we made the microblog streaming data was divided to the different time windows through by the Spark Streaming technology,then in the specified time window of microblog document was distributed computing the document emotional value,the positive of microblog documents were filtered by parallel.Secondly,the regional diffusion of feature words is calculated by operate the regional breadth and depth of the spread of the feature words through by the Join which was the Spark provided,then choosing those feature word which is used to meet the burst characteristics be to realize the parallel of regional burst detection of feature words.Thirdly,to construct the co-occurrence matrix was by the RDD of word set was to LeftOuterJoin the RDD of microblog document set.Fourthly,we used the idea of divide and rule to realize the absolute clustering,first we used the local absolute clustering by the MapPartitions operation,then in the Reduce stage we used the burst word co-occurrence number in cluster center to calculate the similarity between the different slaves result,and then we got the RDD of microblog bursty events in the specified time window.Finally,we found the sub of microblog bursty events in the specified time window and other time window of microblog bursty events RDD by the MapToPair operation,then we got the all-time window of microblog bursty events.The experimental results show that the parallel method of microblog bursty events has a good performance in the SpeedUp,SizeUp and ScaleUp.
Keywords/Search Tags:bursty events detection, burst word, regional analysis, sentiment filter, Spark, parallel
PDF Full Text Request
Related items