Font Size: a A A

Research On Hot Event Detection In Micro-blog Based On Topic Model And Community Discovery

Posted on:2015-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y ZhangFull Text:PDF
GTID:2268330428980820Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With simple and efficient information generating and propagation mechanism, micro-blogging service which is an emerging social network service has been used everywhere in the era of Web2.0. Compared to traditional media, micro-blog more timely and efficient on the news broadcasting and spreading. Therefore, research on event detection in micro-blog becomes a hot topic in recent years. Nevertheless, some specific characteristics of micro-blog bring challenges to tasks of mining events in micro-blog. Firstly, micro-blog data stream contains large amounts of valueless and meaningless messages which is also called noise messages, a major challenge facing event detection from the micro-blog data stream therefore is how to separate messages refer to real-world events from large amounts of noise messages. Secondly, one micro-blog message only have no more than140characters, the sparsity, the large number of spelling and grammatical errors, and the use of mixed languages in micro-blog messages, all these make the traditional text analysis techniques less suitable for event detection in micro-blog.This paper studies relevant technologies about event detection in micro-blog at home and abroad first, and then research and extension of static and dynamic event detection in micro-blog carried according to the prior art deficiencies. In terms of static micro-blog event detection, this paper proposes a text classification method which based on topic model and Bayesian method to detect event micro-blogs in static micro-blog data, which mapping a static micro-blog message as a kind of topic space representation and mining relationships between topics and categories of messages, then the category of a micro-blog message determined by whether the topics’ categories of this message is event or not. In terms of dynamic event detection in micro-blog, this paper presents a method which based on community discovery and kernel method to detect events in dynamic micro-blog data stream, this method first selects event words according to a dynamic event words selection algorithm which proposed by this paper also; then make a semantic graph for micro-blog data in each time slices, each node of a semantic graph is a micro-blog message and each edge means there have the same event words between two message nodes, an algorithm of community discovery is then used to discovery event communities in data of each time slices and the key message node of an event community is returned as a description of event which reflected by this event community; this paper also presents a semantic encoding scheme to generate a binary array label for each message node of each event community, then a graph kernel method is used to calculate the similarity between labeled event community graphs which are in continuous time slices, the results are then used to match event communities which reflected the same event which can be used to track events. The experimental data in this paper is Chinese micro-blog data which crawled by real-time, the above two methods are used to detect the event in micro-blog data, the experimental results show that the above two methods can achieve the desired effects.
Keywords/Search Tags:Micro-blog, Event Detection, Topic Detection, Topic Model, Community Discovery, Graph Kernel
PDF Full Text Request
Related items