Font Size: a A A

Text Sentiment Analysis Based On Text Classification

Posted on:2011-02-12Degree:MasterType:Thesis
Country:ChinaCandidate:M GuoFull Text:PDF
GTID:2178330332958149Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Study of Text emotional tendency becomes a focus, more and more scholars tends to work on it and its applications are constantly expanding. Word of mouth from community supervision by public opinion to the test product can not do without emotional bias of the text. In this paper, we proposed a the combining method based on traditional text categorization. Experiments have been done in two representative corpus. Corpus 1:news text corpus which background is complex and extremely uneven distribution; Corpusâ…ˇ:Stock corpus which Background is single.(1) Analysis the emotional tendency of news text and provide emotional information for the news broadcast automatically to. In this experiment we presents a method to determine the main sentence, and extracted potential rules in the main sentences using statistical methods to supply the rule base which was build on the artificial in order improve the result of analysis of the effect of emotion. In this experiments we use support vector machines, Bayes classifier and the K nearest neighbor classifier as the classifier combined with the rules and use a variety of feature extraction methods and feature weighting method to do experiments and then compare their result. As the news text corpus extremely uneven distribution of its own, leading to the simple statistical method's performance in the rare class of relatively poor, but the combination of rules and statistical methods were not able to completely solve the problem, but has improved the experimental results. Experimental results show that the combination of rules and statistical analysis model is better in many field than simple statistical model. It shows that Rules combined with statistical methods have good universal.(2) This study is based on vertical search applications in the field of stock. The application needs to analysts stock experts's comments on certain stocks do call and think flat, bearish and uncertain classification. In this part of the experiment because of the Corpus is short, the field background is very strong, colloquial more serious, common segmentation software can not do this job well. This paper presents a simple method of positioning feature words, not only meet the test requirements and is more efficiency on time, the time complexity is O (n). As the field background is simple, we can extract rules easily and completely, the Accuracy in this part of the experiment reach 90% or more, and rule method's Performance is better than statistical method's.The combined classification model did well in the news text corpus which background is complex obtain good results than the simple statistical methods, it effectively improved the classification of rare class effect. However, on the single background stock corpus, the combined method have not much increase. It shows that the rules method is suit for the single background corpus.
Keywords/Search Tags:Support vector machine, Bayes, K nearest neighbor, feature selection, weight calculation
PDF Full Text Request
Related items