Font Size: a A A

Research Of Text Sentiment Classification Based On Semantic Comprehension And PLSA

Posted on:2013-11-16Degree:MasterType:Thesis
Country:ChinaCandidate:W J HuFull Text:PDF
GTID:2248330371991313Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, with the development of e-commerce, SNS and micro-blog, the Internet had entered a brand new age. People have space to express their views, and then followed with overwhelming opinions and comments. These vast amounts of unstructured texts contain a great deal of information:companies need to obtain the comments of their products; the government needs to know man’s reflects on a particular policy and users need more suggestions before their shopping. So, how to deal with the information becomes the focus of current scholars. Sentiment classification means classify the views to two parts which are positive and negative by mining and analyzing subjective information in the text, such as standpoints, views, emotions and so on. It can be applied to public opinion analysis, information filtering, product evaluation and recommendation, intelligent search and user interest mining.The main work of this paper is summarized as follows:First, we established a cross-domain corpus, and build a more detailed sentiment lexicon based on Hownet’s sentiment words. The polarity of unlisted words was computed by semantic similarity, then classified texts with words’orientation; Second, we proposed a new sentiment classification model which based on Probabilistic Latent Semantic Analysis (PLSA), it uses a probabilistic model to represent the relationship between "documents-latet semantic-words", documents and words are mapped to the same semantic space, so it can solve the problem of polysemy and synonyms. It can also reduce time and space complexity by EM algorithm.The method based on semantic comprehension is high efficiency but poor adaptability, each field has its own sentiment words, and some words may have different orientation between different fields. In addition, people often use IENS or "irony" to express negative emotion. Therefore, this method tends to identify positive text; On the other hand, the method based on PLSA requires a huge tagged corpus and need long time of training and classification, but has boarder application. Both positive and negative words have a high frequency in negative text. So, it tends to classify the text as negative. In this paper, we combined them together and proposed a self-supervised model, and the experiment shows a precision of more than90%.
Keywords/Search Tags:Sentiment Classification, Opinion Mining, Semantic Comprehension, PLSA, Text Orientation
PDF Full Text Request
Related items