Font Size: a A A

Research And Implementation Of Opinion Summarization On Microblog Data Stream

Posted on:2015-08-22Degree:MasterType:Thesis
Country:ChinaCandidate:S LiuFull Text:PDF
GTID:2348330482952597Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As an increasingly popular Web 2.0 application, microblog has gradually become a very important and indispensable platform for people to record events and share their personal views in daily life, and has attracted more and more attentions from both common users and research communities. Weibo users can login the platform by fixed or mobile client. With the help of "post", "forward" and many other functions, people can express their personal feelings and wishes conveniently and quickly. The "convenient", "readable", and other characteristics make microblog spread widely in a short period of time, and form its unique information transmission features, which are fast, large flow, strong real-time performance, multifarious, etc. These characteristics make the processing of microblog data face enormous challenges. On the other hand, Weibo users are eager to get the public opinion and the tendency of viewpoints for a given event simply and fastly in the real-time. To tackle these challenges, we consider the weibo data flow as stream, and propose novel opinion summarization techniques for getting the summary of the views timely and effectively.For above purpose, firstly, a real-time incremental clustering method is applied on the target data stream according to their topics by means of data stream clustering techniques, which generate dynamic topic clusters changing over time. The experiment results show that, with an appropriate range of parameters, the proposed method can obtain a stable clustering result for a given time, and the topic cluster divisions are clear.Secondly, a topic-opinion tree based on the microbloggers'emotions is built in each topic cluster, and it can be regarded as a kind of compression storage for the topic and opinion information. In the process, fully considering the characteristics of data streams, the dynamic analysis of frequent itemsets is utilized to maintain the built tree. In the related experiments, topic-opinion trees are built successfully, and the size of the tree is controlled effectively by frequent itemsets mining.Finally, by extracting the longest phrase from each topic-opinion tree to re present the main idea of the cluster, the phrases from all clusters are summed as the final result of the opinion summarization. In the comparision with a rel evant method, three-fifths of the reviewers thought the results generated by the proposed method in this thesis are more outstanding, and this reflects the rati onality of the processing model proposed in this thesis.In conclusion, the working processes and methods proposed in this thesis fully consider the real-time characteristics. Moreover, the methods can cover the topics well. The success of topic-opinion tree construction solves the compression storage problem of the topics and opinions, and formed a good result with an acceptable degree of accuracy loss. The final analysis results are strong in readability and are outstanding representative.
Keywords/Search Tags:microblog flow, opinion summarization, topic cluster, topic-opinion tree
PDF Full Text Request
Related items