Font Size: a A A

Research On Key Techniques Of Opinion Mining For Online Public Opinion Analysis

Posted on:2012-09-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:S FengFull Text:PDF
GTID:1228330467981146Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the fast development of Web2.0technology, the role of Internet users has changed from passive readers to active architectures. More and more people are publishing their opinions and attitudes on Web2.0based systems, such as blogs, online forums, review websites, news groups and so on. In face of the emerging public opinion information, it is difficult to manually collect, arrange and process these massive data and the computer aid analysis is indispensable for us to monitor the public opinion on the Web. Different from the methods on product reviews, the opinion mining techniques for public opinion analysis should tackle the new challenges brought by massive and various Web data with fine-grained and mixed emotions. Thus, how to mine the opinion information for the different type of Web data has become the major concern for online public opinion analysis.To tackle the challenge of the key techniques on opinion mining for online public opinion analysis, in this dissertation, we investigate the characteristics of different kind of Web data, and study on public opinion’s extracting and summarization, emotion similarity measurement, opinion-oriented sentence compression and opinion community discovery. The major studies of this dissertation include:(1) For blog searcher results, a lexicon based method is proposed to enrich the representation of blog search results and a spectral clustering algorithm is introduced to partition blog search results into opinion groups, which help us to find out opinion distributions on the Web. A mutual reinforcement random walk model is proposed to rank result items and extract key sentiment words simultaneously, which facilitates user to quickly get the typical opinions of a given topic. Extensive experiments with different query words are conducted based on a real world blog search engine and the experiments results verify the efficiency and effectiveness of our proposed model and methods.(2) For the short texts in user generated content, we propose an enhanced emotion vector based method to measure the emotion similarity between online short texts. The hidden emotion state distribution is learned from Chinese blog emotion corpus based on probabilistic topic model. With the help of the distribution, the weights of emotions in the original sentence pair are enhanced. The new enhanced emotion vector representation considers not only the hidden emotion relationship between the short texts, but also other emotion indicators, such as degree words and punctuations. Experiment results on Chinese blog sentences show that the proposed method outperforms the word based vectors and emotion state based vectors.(3) For Chinese long sentences in news articles, a score function is proposed which considering the word importance, the grammar consistency and the opinion strength. A weakly supervised Chinese sentence compression method is proposed which aiming at eliminating the negligible factual parts and preserving the core opinionated parts of the sentence. No parallel corpus is needed during the compression. Experiments that involve both automatic evaluations and human subjective evaluations validate that the proposed method is effective in finding the desired parts from the long Chinese sentences and it also paves the way for the further opinion mining steps.(4) For tagging and rating data, in this dissertation we define the tagging and rating information as the users’opinion on certain items and the users with similar opinions in online rating systems have underlying virtual opinion community structure. The items, tags and latent friends are recommended based on the assumption that the members in the same opinion community have similar hobbies and interests. The experiments conducted with the real world data validate that the proposed method can effectively recommend users with the items, tags and friends that they are interested in.(5) For different kinds of Web data, we design and implement a public opinion search engine POSearcher. POSearcher utilizes a novel "topic-opinion" word pair as the basic storage unit for online public opinion. The "topic-opinion" word pairs and the Web documents are unified into a two-layer graph model, and an extended HITS algorithm is introduced to ranking the indexed information in POSearcher system, which provides a new tool for online public opinion retrieval, analysis and monitoring.In summary, this dissertation dedicates to study fundamental problems related to opinion mining techniques for online public opinion analysis, such as summarizing public opinion from blogs, measuring emotion similarity for short text, compressing long Chinese sentences for opinion mining task and discovering virtual opinion communities in rating systems. Lots of theoretical analysis and experiments show that these approaches are efficient and effective. We hope that these approaches and techniques could make contributions to online public opinion analysis systems.
Keywords/Search Tags:Online public opinion analysis, opinion mining, sentiment analysis, socialnetwork analysis, similarity measurement, sentence compression, personalizedrecommendation
PDF Full Text Request
Related items