| As a new platform of information dissemination and sharing, microblogging is highlyinteractive and real-time. By using computer or mobile devices, people can share theevents happening around them and post their comments, etc., anytime and anywhere. Andmany hot events and topics are first talked and spread in microblogging, which enablesmicroblogging gradually become a vital information source in people’s daily life.However, the microblogging content length has a limitation of140characters, so thatthe information it conveyed is often fragmentary. Such fragmented informationdissemination makes it difficult for users to get a detailed understanding of an event. Eventhough many microblogging platforms like Twitter and Sina Weibo have provided searchservice, the results it returned are only sorted by recency, instead of relevancy. Users areforced to manually read through the posts to get what they concerned about, which is verydifficult and time consuming. As a result, it is necessary to build a system onmicroblogging, in order to help users to get a detailed understanding of an event in a smallamount of time.In this paper, we present WeiboInfo, a timeline-based prototype system forvisualizing and summarizing events on Sina Weibo. WeiboInfo provides users with atimeline-based display of number of relevant tweets, and uses a self-adaptive peakdetection algorithm to discover the peaks. We can treat the peaks as subevents, andautomatically summarize them to give users more detailed and straightforward information.Meanwhile, WeiboInfo also provides views of relevant tweets, tweet map, popular URLsand tweet sentiment analysis for further browsing, thus enabling users to understand the event in a small amount of time.There have been several researches on microblogging visualization, but they all focuson Twitter, which deal with the language of English. In this paper, we focus on Sina Weibo,and study the microblogging visualization and summarization in the field of Chinese. Sofar, there are no similar applications on Chinese microblogging.For the Sina Weibo platform, we design a web crawler for keyword-related webpages crawling, then extract the related tweets and convert them to structural data to storein database. We also use support vector machine to crack the Sina Weibo captcha to enablecontinuous crawling. As for word segmentation, we use an open-source tool NLPIR forChinese processing, and we apply a revised TF-IDF algorithm for tweet summarization.We also use a2-level SVM based approach for sentiment analysis on the weibo texts tocapture the public attitudes towards an event. Finally, we use Google Map API, to providea tweet map display and let users see the event-affected geographical areas intuitively. |