Font Size: a A A

The Design And Prototype Implementation Of Sentiment Analysis System Based On Semi-supervised Learning

Posted on:2018-03-07Degree:MasterType:Thesis
Country:ChinaCandidate:T Q LiFull Text:PDF
GTID:2348330512989801Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of the Internet,all kinds of social platforms including microblog and Twitter have appeared gradually.People like to express their ideas on these platforms.It is possible to analyze the emotional tendencies of most people about something through these tweets.But it is very difficult to deal with a lot of tweets by ordinary manpower.So it has become a trend to learn that how to classify messages by computers instead of manpower.The semi-supervised learning which needs only a little training data can solve these problems efficiently.And it has become a research hotspot.We design a platform which could analyze the sentiment polarity.And it provides a full set of flow-sheets from data processing to algorithm selection.On this platform,the user could choose different approaches on the basic of existing datasets to do the parallel computing in various ways.Through summarizing the results of these ways,the platform will provide the best combination to the user.Firstly,this thesis summarized the current situation and the related technologies in the field of sentiment analysis.Secondly,it proposed the requirements of this system according to practical application.Finally,we designed the data processing module,the feature selection module,the algorithm analysis module and the distributed operation module.This thesis has the following characteristics:(1)The distributed operation.It is difficult for users to make the right choice because of various methods of processing in data preprocessing and feature selection.Since it is particularly time-consuming to try one by one,the platform can calculate the results of each case by distributed processing.The distributed operation module is used to calculate the results of a variety of processing methods in a multi-threaded manner.The program selects the best way to deal with the particular data,and returns the results to the users.(2)Providing a variety of features.16 sentiment features are set up in the system,including punctuations,parts of speech and other attributes.Combined with the existing theories and techniques,the N-gram calculation method and the dependency relation method are provided.And the platform achieves a comprehensive analysis through the combination of a variety of features.In order to reduce the time of the classification,each feature is sorted according to the important degree,and the features with higher contribution are retained.(3)The semi-supervised classification model is introduced into the algorithm entity module.It combines two classifiers which are SVM and Naive Bayes.And in the process of prediction it can continuously update the model to improve the accuracy.At the same time,the machine learning algorithm and the Lexicon-Based approach are provided in the algorithm entity module.The Lexicon-Based approach is based on the emotional dictionary which has a list of words with known sentiment scores.And it calculates the sentence score by the word score.(4)A variety of data processing methods.The system is able to get data from the social network according to the keywords provided by the users and could save them in the prescribed format.At the same time the system can filter the data,carry on the word segmentation,perform part of speech and so on.
Keywords/Search Tags:Sentiment Analysis, Machine Learning, Semi-supervised Learning Algorithm, Text Classification, Emotional Dictionary
PDF Full Text Request
Related items