Font Size: a A A

Design And Implementation Of Microblog Public Opinion Analysis System

Posted on:2016-08-31Degree:MasterType:Thesis
Country:ChinaCandidate:X F ChenFull Text:PDF
GTID:2308330479993914Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, microblog, a new kind of social networking platform, has been widely used all over the world. Nowadays, microblog is not only a platform that provides us social networking service, but also become an import virtual space where internet public opinion emerges and spreads more easily. Against this background, the significance of microblog public opinion analysis is also gradually recognized by researchers.Microblog messages are always short, spread fast, and easily accumulate to mass data. Therefore, microblog public opinion analysis is different from the traditional work. In this paper, we make a study of the key technologies about network public opinion analysis, and take account into the features of microblog messages, then design and implement a microblog public opinion analysis system.The microblog public opinion analysis system in this paper takes several features described as follows. First, we fetch the microblog data real-timely with weibo API, and adopt a filtering strategy to improve the quality of the microblog dataset. Second, by means of calling the API of NLPIR system, we complete the work of Chinese word segmentation and Chinese POS tagging on microblog texts, meanwhile add a self-built segmentation dictionary to gain some results at user’s will. Third, we develop a strategy to remove the stop words based on three aspects: word length, word POS, stop words dictionary. Fourth, we build a LDA topic model on the microblog text set, and select the optimal topic number via calculating the perplexity of models. Fifth, we perform K-MEANS cluster analysis on the microblog text set with Jensen-Shannon distance, and we relate the k value with the optimal topic number, considering the latent relationship between the optimal topic number and cluster number. Final, we extract the public opinions from the microblog text clusters based on the latent topic information, seek and rank hotspots by calculating the growth rate of microblog data in each cluster.
Keywords/Search Tags:microblog public opinion, weibo API, Chinese word segmentation, LDA model, topic detection
PDF Full Text Request
Related items