Font Size: a A A

Research On Problems For Sentiment Classification Of Review Texts Based On Web

Posted on:2009-09-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:S G WangFull Text:PDF
GTID:1118360245499300Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With rapid development of web technology, the Internet has become a very important source from which more and more people obtain information. At the same time, it is also rapidly becoming the platform for people to express their viewpoints. Facing with promptly increasing reviews on the Web, it has been a great challenge for information science and technology that how people effectively organize and process document data hiding large amounts of information, obtain the latest information to meet with particular needs and distinguish useful and worthless information. Sentiment classification automatically classifies a text as expressing positive or negative sentiment through mining and analyzing subjective information in the text such as standpoint, view, attitude, mood, and so on. Text sentiment classification can be widely applied to many fields such as public opinion analysis, product online tracking and movie & TV program appraisal.With the aid of theories and methods in computational linguistics, statistics, machine learning, from the view of multi-hierarchy linguistic granularity, such as word, collocate, relative pair consisted of a product feature and a sentiment word, sentence etc., we study the modeling, analysis and calculation of text sentiment orientation to develop new technologies and methods for text sentiment classification. The major works and innovative contributions of this thesis include:(1) Feature selection for text sentiment classificationFrom the views of restricting the range of candidate features, the category distinguishing ability of features, three kinds of feature selection methods are proposed. One is based on the words with restricted Part of Speech and Information Gain (RPSIG). Another is based on the category distinguishing ability of words and Information Gain (CDAIG). And the third is based on Fisher criterion (FC). A comparable experiment indicates that RPSIG and FC are superior to CDAIG(2) Automatically acquiring method for Chinese collocations with sentimentAccording to the characteristics of Chinese sentiment collocation, we design tenkinds of collocation patterns, and investigate the influence of the length of the window be selected to sentiment collocation. A measure used to depict the association between two words is proposed. And based on collocation pattern and the measure a method for acquiring sentiment collocation is brought forward.(3) Automatically identifying method for relative pairsFrom the environment information which impact whether or not two words constitute a relative pair, such as Part of Speech, distance between two words, dependency grammar, we explore automatically identifying method for relative pairs based on Maximum Entropy Model. Two kinds of methods for constructing the features of Maximum Entropy Model are proposed, which are based on Part of Speech and distance information and based on dependency grammar information respectively. By using the features constructed in Maximum Entropy Model various kinds of complex feature templets are designed for relative pair identification. And these complex templets are tested in some different sentence sets.(4) Multi-hierarchy linguistic granularity analyze for text sentiment classificationBased on the ideal that high level linguistic granularity may be expressed withlower level linguistic granularity, we design a text expression model with hierarchical structure, that is word (collocate or relative pair)→sentence→text. The sentiment orientations of words directly influence the sentiment orientation of a higher level linguistic granularity. Therefore, a method for word sentiment orientation classifying is proposed based on synonym. By using the sentiment orientations of words, the sentiment orientations of a collocation or a relative pair are decided. For sentence and text, we present a method based on weighted linear combination for their sentiment classification.(5) Method of text sentiment classification based on generalized rough set modelTo make rough set theory suitable for text sentiment classification, the dataexpression model in classical rough set theory is generalized, i.e., a text vector expression model with sentiment orientation intensity is proposed. A discretization method based on the order of sentiment orientation intensities is proposed for reducing the text dimension. We propose a weighted rough membership function for text sentiment classification.(6) A consumer-oriented car product evaluation systemUsing the theoretical results obtained in this thesis we develop a consumer-oriented car product evaluation system.
Keywords/Search Tags:Text sentiment classification, Feature selection, Multi-hierarchy linguistic granularity, Text expression, Rough set theory
PDF Full Text Request
Related items