Font Size: a A A

Research On Weibo Sentiment Analysis Technology Based On Semi-supervised Learning

Posted on:2019-02-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2438330545456935Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Internet users are more than information receivers in the Web 2.0 era.They are increasingly willing to share their life and thoughts on the Internet.Microblogs,as a simple and fast platform,has gained popularity all over the world.Billions of microblog are created daily,with rich information despite their disorganization and irregularity.Such information may be valuable in personal decisions and product improvements.Therefore,sentiment analysis of microblog has become a hot topic in academic research.Current study on sentiment analysis mainly focuses on supervised algorithm,which has achieved good performance at the cost of large collections of labeled data.However,it is difficult to acquire large amount of labeled data.In contrast,getting unlabeled data of the same size are much easier.This paper focuses on semi-supervised learning algorithms for Chinese sentiment analysis,combining labeled and unlabeled data for a better model.As microblogs are highly noisy and use informal languages,a simple sentiment analysis method may not work well.So this research mainly contains discussions about preprocessing,feature extraction,model construction and so on,with main work as follows:(1)Improvement of feature extraction.Information gain ratio,as a popular extraction algorithm,does not suit special features in microblogs,such as emoticons.This paper proposes an improved algorithm based on feature fusion,filtering traditional features and emoticons with different methods and then merging the results.Experiments show that the improved method works better than the baseline,given the same feature dimension.(2)Improvement of feature weighting.Emoticons are key features for microblog sentiment analysis,but the traditional TF-IDF algorithm does not fit such sparse features.This article provides an improved algorithm that increases emoticon weights.With the suitable weight of emoticons,the improved algorithm works better when all other settings stay the same.(3)Improvement of the Progressive Transductive Support Vector Machine(PTSVM)algorithm.The idea of PTSVM is to add unlabeled data and predicted labels iteratively to improve the classifier.But the classification hyperplane is only affected by a few support vectors,while most samples do not have any impact.The result is that most added samples do not affect the model at all,or worse,some may push the hyperplane in wrong directions.It increases running time and may loweraccuracy in a certain degree.This paper proposed a revised PTSVM algorithm based on KNN and three clustering algorithms,which clusters the unlabeled samples beforehand and removes unnecessary data.Experiments show that PTSVM with KNN and K-means has shown great reduction in time and a slight improvement in accuracy.
Keywords/Search Tags:Sentiment Analysis, Microblog, Semi-supervised Algorithm, PTSVM Algorithm, Text Clustering
PDF Full Text Request
Related items