Font Size: a A A

Sentiment analysis of Twitter data

Posted on:2017-01-26Degree:M.SType:Thesis
University:Rensselaer Polytechnic InstituteCandidate:Yuan, BoFull Text:PDF
GTID:2468390011998760Subject:Computer Science
Abstract/Summary:
Sentiment Analysis and Opinion Mining has become a research hot-spot with the rapid development of social network websites.Twitter is a typical social network application with millions of users expressing their sentiment every day. In this work, we explored comprehensively the methodologies applied in sentiment classification over Twitter data: lexicon-based, rule-based and machine learning-based methods. Our data-set is crawled and manually cleaned with the principle of Naturally Annotated Big Data. The data-set contains 20, 000 tweets ranging over ten popular topics.;For lexicon-based methods, we experimented with the Simple Word Count approach and Feature Scoring approach using most popular sentiment lexicons and semantic resources, namely MPQA subjectivity lexicon, SentiWordNet, Vader Sentiment Lexicon, Bing Liu's lexicon and General Inquirer. We built customized sentiment lexicons, designed featuring scores and compared ten classifiers on real-world Twitter data. Further, we designed Lingusitic Inference Rules(LIR) to improve lexicon-based classifiers. LIR aims to handle negation, valence shift and contrast conjunctions in natural language. For machine learning-based methods, we used state-of-the-art supervised learning models: Naive Bayes, Maximum Entropy and Support Vector Machines. Two sets of features are compared. The first set of features is Bag-of-Words with N-Gram. The second set of features is Part-of-Speech linguistic annotation.
Keywords/Search Tags:Sentiment, Twitter, Data
Related items