Font Size: a A A

A Study Of Hot Events Detection Base On Short Text In Public Opinion Analysis System

Posted on:2012-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:J L BaiFull Text:PDF
GTID:2298330335460477Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
The rapid development and popularity of the Internet has created enormous economic and social benefits. At the same time, it also spread a variety of bad information and irregularities. Internet public opinion analysis is an important way to address these issues and maintain the healthy development of the Internet. Topic detection is one of the core technologies of public opinion analysis system. And rich text based topic detection technology is relatively mature. However, with the rapid development of some new media forms in recent years, such as micro-blog and BBS, the short text has occupied an important position in the Internet. The most important difference between short text and rich text is that there are not enough features. Therefore, methods of short text processing must be different from rich text.Short text processing techniques vary widely. In this paper, a method based on short text’s content backbone is used to compare the similarity of short text to achieve similar duplicate detection. In this paper, word-level clustering technology is used to combine a number of key words to form event. This paper identifies those hot events based on burst terms, terms co-occurrence and generative probabilistic model, the Boolean model and the inverted index are used to measure the correlation between words. Then, short texts are classified into those hot events. The number of short texts in each hot event can describe the degree of public concern. This experimental data this paper used is real-time text flow (micro-blog, news title, etc.), which crawled from the Internet. Experiments with huge text stream sets suggest that our algorithm can work on-line and identify hot events effectively and efficiently. This paper also designs a distributed architecture to detect similarity duplication of short text and a system of rapidly discover hot events from short text stream.
Keywords/Search Tags:short text, hot events detection, clustering, public opinion
PDF Full Text Request
Related items