Sentiment drift and its effect on the classification of Web log posts

Posted on:2009-09-19

Degree:Ph.D

Type:Thesis

University:Harvard University

Candidate:Durant, Kathleen T

Full Text:PDF

GTID:2448390005958938

Subject:Computer Science

Abstract/Summary:

Sentiment classification separates a collection of opinionated text into two opposing classes: favorable and unfavorable. It has been successfully applied to online product comments and movie reviews. Previous studies have shown that topic, domain, and time influence the results of machine learning models used to classify sentiment. This thesis furthers the investigation of time on sentiment classification. It defines the phenomenon of sentiment drift: the change of sentiment over time. We create a topic-specific corpus and demonstrate a change in sentiment over specific time periods. The source of the corpus is web logs; we find it to be more difficult to classify than previous studied corpora.; Previous work has shown that factors such as machine learning induction technique, class composition, dataset size and feature selection all influence predictability. We show models with configurations that maximize predictability under these factors are still influenced by time. The most successful configuration we found is a collection of Naive Bayes models with applied feature selection and a balanced class composition. The collection on average, predicts the sentiment of a web log post 89.77% of the time.; We perform collections of sentiment classification experiments varying the difference (in months) between the testing and the training period calling it the testing-training difference (TTD). We show as the TTD increases the predictability of the sentiment model decreases. Models trained on months chronologically closer to the training month significantly produce higher accuracies. We also show models trained on future data significantly outperform models trained on past data. We investigate statistical subsets of the models and show that each subset is influenced by the TTD.; We show that models that incorporate the influence of time produce higher predictability. We find, for example, ensemble models that define a weight based on the TTD produce higher predicatibility than those that do not ([2.176, 5.092] alpha-level .05). The findings show 3-month ensembles outperform the 5-month ensembles ([.39 alpha-level .05]), indicating component models created more than three months from the testing examples decrease the results of an ensemble.

Keywords/Search Tags:

Sentiment, Classification, Models, Web, TTD

Related items

1	Neural Network Models Incorporating Sentiment Information For Short Text Sentiment Classification
2	Sentiment drift and its effect on the classification of Web log posts
3	Research On Aspect Category Sentiment Classification Based On Deep Learning Models
4	Research And Application On Chinese Micro-Blog Sentiment Classification
5	Research On Sentiment Classification For Microblogging Based On Multimodal Data
6	Research On Sentiment Analysis Of Chinese Text-Oriented Neural Network Models
7	Chinese Product Sentiment Classification Based On Sentiment Strengths
8	The Key Technologies’ Research And Implementation About Information Acquisition And Emotion Classification On New Social Network Media
9	Research On Sentiment Classification For Web Reviews
10	Research On Key Techniques Of User-oriented Text Sentiment Analysis