Font Size: a A A

Sentiment Analysis And Related Issues For Twitter

Posted on:2015-10-30Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhuFull Text:PDF
GTID:2298330452450764Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, with the development of the Internet and Mobile Internet, socialnetwork also has gotten a rapid development. The action that the users generated textinformation actively through Internet marks the person is no longer a simple audience,but a part of the Internet. The mobility of microblog, and the sharing, simplicity andrealtime performance of contents have made microblog become an indispensablesocial network that used for interaction in the daily life of Internet users, most ofusers have spaces and freedoms to express their opinions. This expressions orcomments can be just a simple message from common user, or a purchase intentionsfrom network consumers, or a movie comment from movie fans, or some opinionsabout the policies and regulations which have published by governmentadministration from cyber user, how to get a valuable content from the vast amountsof unstructured short text information has become a problem to be solved at present.The popular of social network led the birth of a new research field that is themicroblog sentiment analysis. Microblog sentiment analysis inherits the characteristicof text sentiment analysis that analyzes the emotion tendentiousness from theemotional expression of microblog, the result of the analysis is to divide themicroblog sentiment into a positive or negative class, or positive, negative and neutralclass. so that researchers can clearly know that the attitude expressed by the text issupport or against, thus make the corresponding decision.In this thesis, we mainly study how to use the traditional text classificationmethod applied to the sentiment classification of microblog. Considering use themachine learning method to implement the Twitter sentiment classification. In thisthesis, we analyze the critical technical problem about the Twitter sentiment analysis,and focus on the research which is the processes and methods for increasing accuracyof classification; in this thesis, we also analyze the influences on Twitter sentimentclassificational accuracy whether come from different methods of feature extraction,feature weight calculation, text representation and the construction of the classifiermodel.In this thesis, we use the Twitter as the dataset and then use the part-of-speech tagger tool which is developed from the Stanford Natural Language Processing Groupto preprocess the tweets. After text preprocessing, we choose three different kinds offeature extraction methods those are document frequency, information gain andchi-square to extract the features from the dataset, and then respectively use booleanweighting, term frequency and TFIDF(Term Frequency Inverse Document Frequency)to calculate the weight of features. Lastly, two kinds of classifier are used which arebased on supervised learning method to classify the text sentiment and they are NaiveBayes Classifier and Decision Tree Classifier. In this thesis, we have tried on manyexperiments in using different number of features, feature weightings andclassificational algorithms to train classifier and then used test data to test thoseclassifier. The experimental results indicate that the performance of combination ofNaive Bayes, CHI and TFIDF is the best in those experiment in this thesis.
Keywords/Search Tags:Sentiment analysis, Text classification, Feature extraction, Feature weight, Supervised learning
PDF Full Text Request
Related items