Font Size: a A A

Research On Key Problems Of Blog Opinion Retrieval

Posted on:2016-01-13Degree:MasterType:Thesis
Country:ChinaCandidate:Q LiFull Text:PDF
GTID:2308330476454936Subject:Library and file management
Abstract/Summary:PDF Full Text Request
In recent years, people pay more attention to the value of opinionated information in the Internet. Consequently, an increasing number of researchers are engaged in opinion retrieval. Blog is one of the media that gathering huge amount of opinionated information, so many studies take it as a research object. How to find blogs which contain opinionated information and are continuously related to a given topic is a research focus of opinion retrieval and data mining fields.TREC(Text Retrieval Evaluation Conference) proposed blog opinion retrieval task in 2006. From then on, more and more researchers did their research based on this platform. Although they gained many achievements, there are still some key problems for us to study. For example, how to represent a blog, how to combine topic score and opinion score, how to count the opinion score. To address these key problems of blog opinion retrieval, our work includes:1. The Global Model has a restriction to multiple topics and the Pseudo-cluster Selection Model has a fixed parameter. To solve these problems, this paper analyses several blog representation models and use a new model according to the feature of the task. This model uses top k posts to represent a blog and k can be adjusted according to the amount of relevant posts.2. The two stage method can’t combine the topic relevance and opinion relevance well. This paper learns from the blog post opinion retrieval framework proposed by previous study, applying a generation model deduced by Bayesian approach to the blog opinion retrieval to combine topic score and opinion score.3. Some method didn’t consider the topic-specific information when estimating the opinion score. To solve this problem, this paper proposes a new opinion retrieval model. In our model, we use the Pointwise Mutual Information(PMI) to expand different sentiment words for different topics based on a general sentimental lexicon and then apply pseudo-relevance feedback technique and language model method to estimate the opinion relevance of a blog. We take into account the related information between topic and alternative words not only in expanding sentiment words but also in the process of estimating opinion score.This paper verifies the above three models respectively by experiments. Results show the effectiveness of our proposed approaches. Our experiments are based on TREC 2010’s datasets and topics and our overall result is better than the best result of TREC 2010 participants. In addition, our opinion retrieval method does not need any annotated data to train, so it is also applicable to other similar situations.
Keywords/Search Tags:opinion retrieval, language model, generative model, blog representation model, topic-specific opinion
PDF Full Text Request
Related items