Font Size: a A A

The Research On Method Of Bursty Event Detection Based On Micro-blog

Posted on:2017-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2308330503961496Subject:computer science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology, news, forums, microblog and other emerging Internet services have become an important platform for spreading and obtaining information. Especially in recent years, microblog have sprung up everywhere, and is loved by the majority of users for its real-time and convenience. The dynamic information on microblog has become such a powerful tool to grasp the pulse of the community, so mining event from the mass of information on microblog and discovering the social dynamics have an important impact on the stability and social interests of the public. In this paper, we chose twitter as the target microblog platform, then detect and track bursty events based on twitter data. Our work in this paper is divided into the following areas:First, selecting the appropriate segmentation tools, in this paper, we selects ansj as the chinese segmentation tool, and build custom user dictionary and stop words thesaurus. According to the characteristics of emergencies, we designs database structure and build tables related to bursty event.Next, dividing twitter dataset into different time window chronologically and preprocessing data in the unit time window to obtain new datasets. For the new datasets, we extract time information and content information and use segmentation tool to segment sentences in the content, then we will remove stop words and add meaningless words to stop words thesaurus in the process of segmentation. After that we extract burst words and utilize the co-occurrence technology to build similarity matrix. At last, recording time interval of burst words.Finally, with the input of burst words and similarity matrix, we carry out cluster analysis using the bottom-up aggregation hierarchical clustering algorithm, then we will get a binary tree consist of burst words, and then we will split the binary tree to get clusters related events using appropriate threshold, afterwards we need to build the relationship of those similar event clusters and the burst time interval corresponding, eventually we get the accurate incident and the burst time corresponding to it.Based on the above work, we achieved an emergency detection system. Using an improved algorithm BBW(Basic-Burst Weight) method to extract burst words which improved the accuracy in extracting burst words. Finally, we will analyze and verify the effectiveness and accuracy of the system with the Twitter dataset.
Keywords/Search Tags:microblog, time window, burst words, burst time, hierarchical clustering
PDF Full Text Request
Related items