Font Size: a A A

Research On Semi-supervised Sentiment Analysis Method Of Weibo Text

Posted on:2021-03-31Degree:MasterType:Thesis
Country:ChinaCandidate:S LiuFull Text:PDF
GTID:2428330611968730Subject:Air transportation big data project
Abstract/Summary:PDF Full Text Request
Due to the popularity of social media and the low threshold for releasing information,emotions and opinions from social media provide the latest and most extensive information.With the advantage of conciseness and real-time,Weibo has quickly becomes a very popular social network platform and new media platform.The sentiment analysis of Weibo text is not only of great significance for traditional consumers and enterprises to collect opinions on products or services,but also plays an important role in national security and public opinion analysis.Machine learning,especially supervised learning,requires a large amount of labeled data as training data.It is expensive and time-consuming to label large amounts of data in practical application.However,it is very easy to obtain a large amount of unlabeled data from the Internet,so making the best of unlabeled data for data mining has become a research hotspot.Aiming at the problem of insufficient labeled data,semi-supervised learning method was used to analyze sentiment on Weibo text.First,semi-supervised text classification model of multi-classifier integration is constructed.Draw on the idea of ensemble learning,based on the Bagging integration framework,the bootstrap sampling method is used to sample the training data.Out-of-bag data(OOB)is used to evaluate the quality of the data labels and classifiers performance.Calculate the weight value of both as the weight of the discriminating data label,and jointly determine the sentiment polarity of the text by fusing the results of multiple classifiers.At the same time,a dynamic threshold function is proposed during the training process to achieve a balance between the quantity and quality of labeled data.Semi-supervised method for multi-classifier integration is suitable for the case with fewer initial labels.However,it is a supervised learning process internally,However,it is neglected to use a large amount of unlabeled data to mine the distribution information of data for more accurate classification.Therefore,a semi-supervised generation model based on variational self-encoding is proposed.The long-short-term memory network(LSTM)is suitable for text sequences.Therefore,it is used as the encoder and decoder in the semi-supervised variational self-encoding model,aim to encode the original text sequence and generate new text sequence.The derivative model Bi-LSTM is used as a classifier in the semi-supervised variational self-encoding model.The classifier has two functions,one provides the label information for the encoder and generates hidden variables with it.Second,it provides the label information for the decoder and generates new samples together with hidden variables.The whole model improves the classifier accuracy by optimizing the objective function between the original sample and the real sample.Finally,COAE2014 task 4 data set was used to validate the effectiveness of the two models.
Keywords/Search Tags:Weibo, Sentiment Analysis, Semi-supervised, Multi-classifier, Variational Self-encoding, Bi-LSTM
PDF Full Text Request
Related items