Font Size: a A A

Research On The Algorithm Of Comment Sentiment Analysis

Posted on:2017-11-06Degree:MasterType:Thesis
Country:ChinaCandidate:L YuanFull Text:PDF
GTID:2348330485462233Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Text sentiment analysis is a hot research topic in the field of natural language processing and sentiment analysis. This dissertation introduces the research background, research significance, research status at home and abroad and relevant theories of the text sentiment analysis, and mainly focuses on the comments text sentiment analysis. An improved feature selection algorithm is proposed in this thesis, the main work and contributions of this thesis can be described as follow:(1) This dissertation focuses on comment text sentiment analysis in English, and IMDB movie reviews are utilized as corpus. On the basis of the theory and technology of text categorization, three text pretreatments, five feature models, four features selection methods and four classifiers are used for research in this thesis to construct the appropriate method for English comments text sentiment analysis.(2) Since chi-square statistics feature selection method exists a negative correlation phenomena and tends to choose low-frequency features words, in this thesis, a normalized word frequency based chi-square (NF-CHI) feature selection method is proposed. Firstly, the improved NF-CHI model can make full use of the length of text and the distribution of feature words, which can be used to calculate the normalized word frequency, and then, the normalized word frequency, concentration information and dispersion information are integrated into traditional chi-square model, at the same time, the unrelated features can be removed in this procedure, at last, the validity of the proposed method is verified by experiments.Experimental results show that:the model that selects unigram as feature sets, chi-square statistics as feature selection method, and SVM classifier with RBF kernel function gets the best accuracy. In order to verify the effectiveness of the improved algorithm proposed in this dissertation, balanced corpus, imbalanced corpus, mixed-length corpus and Pang movie reviews corpus are used for comparison based on Naive Bayes and SVM classifiers, experimental results demonstrate that the new proposed method can improve the accuracy of comment text sentiment classification.
Keywords/Search Tags:Sentiment Analysis, Feature Extraction, Chi-squal statistics, Support Vector Machine
PDF Full Text Request
Related items