Font Size: a A A

Research Method On Multi-domain Text Sentiment Orientation Classification

Posted on:2013-06-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y BaoFull Text:PDF
GTID:2248330374456668Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
With the rapid development of Web2.0technology, there is tremendous number of subjective text information on-line. In order to deal with mining and analysis of the text information, text sentiment orientation analysis technology causes the concern of many researchers. Method of feature selection for text sentiment analysis technology is an important step, but only considering the feature of text sentiment classification effect is one-sided. And in this thesis we combined the text information and the categories capacity of features to obtain both topic information and feature of the distinguish capacity. In view of the multi-domain text sentiment orientation classification problems, the thesis carries out the following parts:(1) In order to achieve the multi-domain text sentiment classification, this paper uses LDA model to analyze text information. Through the establishment of the text surface text and hidden in the fragments within the different themes between the association, we obtain the probability distribution over the topics in the text and finally realize the text of thematic aggregation. By2008, text tendency analysis and evaluation of2704texts on experiment,10subjects under categories with known domain category matching the results show that, the subset of the highest purity of text clustering.(2) In order to the further hybrid domain text sentiment classification research, this thesis uses the LDA model and the Fisher criterion together to combine intersection set and union set, and obtain text sentiment orientation discriminating feature. Based on these results, by using TF-IDF feature weight calculation method and SVM classifier for the text sentiment classification on the same text corpus, we conduct the experiment comparison. The results demonstrate that, when the festure dimension is the lowest, two features of mixed intersection obtain the best sentiment classification results. (3) In response to a multi-field text sentiment classification problem, this paper firstly uses the LDA model to change the mixed field text into field clustering, and then by using Fisher criterion feature selection method to select feature in field categories text. And furthermore this paper similarly uses TF-IDF feature weight calculation method and the SVM classifier to conduct experiment comparison on the same text corpus. Finally the results show that, when the information is relatively clear, text sentiment orientation classification results are quite good and text sentiment classification is associated with field.
Keywords/Search Tags:Text sentiment orientation classification, Feature selection, Latent dirichlet allocation, Fisher discrimination criterion, Multi-domain
PDF Full Text Request
Related items