Font Size: a A A

Opinion Mining And Application In Social Media

Posted on:2015-04-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:S X XieFull Text:PDF
GTID:1108330509461043Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As social media becomes increasingly popular, more and more people express their opinions on the Web in various ways in real time. Such wide coverage of topics and abundance of users make the Web an extremely valuable source for mining people’s opinions about all kinds of topics. However, since the opinions are usually expressed as unstructured noisy text fragments scattered in different sources(i.e., different users), it is difficult for the users to digest all opinions relevant to a specific topic within a large amount of text pieces, which needs the computational methods to automatically analyze, integrate and summarize the opinions articulated in all the text fragments. This thesis focuses on the problem of automatic opinion analysis including opinion mining, integration and summarization, whose goal is to better support modeling huge amounts of opinions for all topic of interests of social media users, and further to analyze their interaction behaviors based on these opinions.To systematically study this problem, we have identified three important steps of opinion analysis: extraction of sentiment knowledge, sentiment polarity classification of opinionate text, and opinion integration of users. These steps form three key components in an integrated opinion summarization system,the results of which are used to promote online behavior analysis of users. Accordingly, this thesis makes contributions in proposing novel and general computational techniques for four synergistic tasks:? Extraction and construction of Chinese sentiment lexicon: Current sentiment lexicons are built mainly for English sentiment knowledge, which are basis of opinion analysis and play important roles in tasks such as opinionative text identification and feature selection of sentiment classification, etc. There are relatively few studies on extraction and construction of Chinese sentiment lexicon, and there is no comprehensive and dependable Chinese sentiment lexicon available yet. The sentiment lexicon compiled by human is time-consuming and laborious, while has a low coverage. Therefore based on the sentiment knowledge mapping between words of different languages, and drawing from current English sentiment lexicons, we proposed a novel method to identify a number of Chinese sentiment words and calculate their sentiment polarity value using bi-linguistic semantic definition of How Net knowledge resources, which formed a Chinese sentiment lexi- con named Senti Hownet. In order to improve coverage and domain adaptability of Senti Hownet, we analyzed and verified language rules based extension method and corpus based statistical context features extension method with experiments, and proposed a hybrid method by combining two methods. The Senti Hownet lexicon is constructed automatically without human annotation, which has wider coverage and better adaptability for domain opinion analysis than other Chinese sentiment lexicons.? Sentiment polarity classification based on feature space division: Sentiment classification classifies the text into predefined categories according to features cooccurence, and can be regarded as a kind of special text classification. The bag-ofwords features of sentiment classification are often used with different functions:some features represent the same general sentiment polarity across different domains and context, while others represent specific sentiment polarity only in specific domain or context. Therefore, we proposed to divide the feature space of sentiment classification task into two separate parts, including domain-dependent part and domain-independent part. Two different classifiers are learned using two feature parts, and then combined together into a stronger sentiment polarity classifier in a bootstrapping framework.The framework started training on an off-the-shelf idiom resources without annotation in a bootstrapping way. The proposed method can achieve the performance of supervised methods without any annotation dataset.? Integration of users’ opinions: User-generated content(UGC) of social media are often short and dispersed text fragments, so that the opinions of users about topic of interests are scattered in the unstructured fragmented short text. To be able to digest opinions of users comprehensively and accurately, we proposes the concept of subjectivity model by combining the topics and opinions together, in which the opinions are integrated according to the different aspects of the same topic articulated in the UGC. We also put forward a general representation of opinion, which defined opinion as sentiment distribution over a scalable sentiment value space, and provided a more detailed and informed multi-perspective view of the opinions.? Interaction behaviors analysis of users: As direct applications of subjectivity model, we analyzes the subjective motivation of the information dissemination be-havior for the social media users. For three scenarios a Twitter user retweeted a message, that is, the user retweeted for he is interested and attracted by message content, the user retweeted a message of a close friend based on the social needs and the user retweeted for conformity needs because the message is popular, we proposed three subjectivity similarity measurements. For retweeting behavior analysis,the three subjectivity similarities are verified to be correlated to the retweeting behavior, and can serve as useful features for retweeting behavior prediction, which could significantly improve the performance of existing prediction models.We focus on general and robust methods which require minimal human supervision so as to make the automated methods applicable to a wide range of topics and scalable to large amounts of opinions. This focus differentiates this thesis from work that is fine-tuned or well-trained for particular domains but are not easily adaptable to new domains. Our main idea is to exploit many naturally available resources, such as off-the-shelf lexicon,which can serve as indirect signals and guidance for generating opinion analysis. Along this line, our proposed techniques have been shown to be effective and general enough to be applied for potentially many interesting applications in multiple domains, such as business intelligence and sociological Research.
Keywords/Search Tags:Social Media, Sentiment lexicon, Centiment classification, Opinion integration, Information dissemination
PDF Full Text Request
Related items