Font Size: a A A

Personal Weibo Public Event Detection Algorithm Research

Posted on:2015-05-14Degree:MasterType:Thesis
Country:ChinaCandidate:H Y ZhouFull Text:PDF
GTID:2298330422490289Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computer application technology, the Internet media also rise accordingly and quickly affect people’s daily lives, and at the same time became a news carrier television, newspapers and radio and other traditional media later. Due to the rapid spread of information can be achieved in the Internet space, the information itself is also showing a diversified, open and real-time characteristics, and therefore serves as an important role in the Internet community platform for the dissemination of the hot events in real time.Sina Weibo is a typical representative, is the rise in recent years, a new and rapidly growing online media. Users can through a variety of ways WEB pages, mobile clients, such as status updates and information sharing anytime, anywhere. Sina is currently the most widely popular, user-largest microblogging site, according to the July2013Latest statistics show that Sina microblogging registered users has reached330million, forming a large amount of data microblog.Since microblogging data with irregularities, massive and real-time characteristics. So how from a large, irregular personal micro-blog data to accurately extract the user within a certain period of time concerned with public events, is the current issue of personal information detection technology microblogging primary solution.Microblogging personal data as experimental test samples, the main research work is how to detect micro-blog information based on personal attention to what a user public events within a certain time. After repeated experiments show that the traditional event extraction algorithm used in personal micro-blog event processing results are not satisfactory. Therefore, on the basis of a series of algorithms to try and experiment many times, considering the individual characteristics of non-mainstream microblogging text, short text data mining research background to extract keywords to focus on the subject, launched obtained from the text, pre-processing, similarity measure, calculate the eigenvalues, and the final match of the forward and reverse common template matching a series of studies.The issue has become a rational, complete personal microblogging public event detection operation process, summed up into text preprocessing, keyword identification and common template matching three modules. Specifically noise pretreatment mainly clear text, making the text more standardized representation; Keyword extraction is mainly based on the calculated coupling, timing and popular of the three similarity and TF-DF function of both the application made by a combination of methods, which takes into account not only the characteristics of the experimental data, but also improve the keyword extraction accuracy; common template matching keywords and Sina by Billboard template matching events in turn forward and reverse match in two steps, get the final public event detection results.
Keywords/Search Tags:Weibo, Subject Term, TF-IDF, Public Event Detection
PDF Full Text Request
Related items