Font Size: a A A

Sensiment Classification Of Micro-blogs Corpus Based On Automatic Annotation Training Set

Posted on:2014-04-12Degree:MasterType:Thesis
Country:ChinaCandidate:S N LiFull Text:PDF
GTID:2268330401981196Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Microblogging quickly became the new darling of the Internet users because it is fast,convenient, real-time and other characteristics. Microblog plays an important part of emotionalexpression.gradually attracted people’s attention. As a result Its importance gradually attractedpeople’s attention.This paper studies the microblogging setiment classification method.As a result, In thispaper we put forward an automatic annotation method for microblogging corpus without anyhuman intervention,thus achieving microblogging text classification according by emotions.Wedivide microblogging emotional tendencies positive, negative and neutral categories. Through themicroblogging API and web analytic method of combining, access microblogging corpus. By themethod of automatic annotation corpus training corpus emotional polarity mark, after the trainingdata have been marked for feature extraction for test data classification microblogging,microblogging text sentiment classification achieved. Finally, verification of methods. Its maintasks: First, through the existing literature and microblogging characteristics analysis of the datato determine the particle size of the entire study microblogging, proposed the use of emoticonsand a combination of the word emotion emotional tendencies automatic annotation methods toachieve the training set the automatic annotation. This method has a high versatility, reduce themanual labeling of generating a large amount of manpower cost and time consumption, reducethe domain of conventional labeling methods, subject factor and time dependence of the existingautomatic annotation method for improving the accuracy of. Second, research training set featureextraction method, using N-gram model for feature extraction get hot words and features. Third,the study measured data classification method using Bayesian classifier for classification, and inorder to further improve the classification accuracy, the paper also carried a Naive Bayesclassifier combines maximum entropy algorithm optimization.The last is to achieve holidaysmicroblogging emotional tendency analysis System model and algorithm for the originalexperimental analysis through experimental results verify the effectiveness and feasibility of theproposed method.This paper provides a microblogging sentiment analysis method helps to keep abreast ofpublic feedback on products, hotspots, policy, for the user’s own corporate and governmentdecision support. Although the results confirm that the classification results of the method ismore satisfactory, but the study in part are still some problems, such as expression library andemotional knowledge base need further improvement, how to achieve automatic evolution willfurther improve.
Keywords/Search Tags:Microblogging, Sentiment Analysis, Automatic annotation, Feature Extraction, Bayesian Classifier
PDF Full Text Request
Related items