Font Size: a A A

Chinese Microblog Sentiment Analysis Based On Big Data Platform

Posted on:2020-09-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhaoFull Text:PDF
GTID:2428330578455874Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,Microblog relies on its individuality and timeliness to rapidly develop and become an important carrier for network users to publish and share information.People are increasingly sharing their lives and emotions with social media such as microblog,which makes a lot of subjective information appear in social media such as microblog.These subjective information often have strong emotional colors.Inductive analysis of these subjective information can gain a lot of knowledge and data.It can bring a lot of help to our life and work,Therefore,the sentiment analysis of microblog has great value and significance.At present,text sentiment analysis is gradually becoming a hotspot of information research,mining and application.The development of the Internet has also brought about the arrival of the era of big data.In microblog,big data mainly refers to the text information published by microblog users and the uploaded pictures and videos.Faced with the huge amount of microblog data and complex computing algorithms,the traditional single machine mode sentiment analysis method is difficult to complete the analysis task accurately and quickly.The emergence of distributed data storage and analysis methods solves the shortcomings of single machine mode.By constructing parallel storage and calculation modes,the processing efficiency and accuracy of text analysis can be improved.This paper implements a method of Chinese microblog sentiment analysis based on big data platform.Firstly,according to the characteristics of microblog comments,combined with the existing authoritative sentiment dictionary such as HowNet,a method of constructing and expanding microblog sentiment dictionary is proposed.In the first step,the SO-PMI algorithm is used to calculate the similarity between the seed words and the unregistered words through the Pointwise Mutual Information,and then the emotional tendency of the unregistered words is initially determined;In the second step,we use Word2 vec tool to train the sample set and calculate the emotional orientation of unknown words by the distance between word vectors;In the third step,combined with the result calculated by the SO-PMI algorithm,the emotional polarity value of the unregistered word is obtained,Then,according to the emotional polarity value,the unregistered words are added into the corresponding sentiment dictionary to complete the construction and expansion of the emotional dictionary.Secondly,design experiments,formulate semantic rules combined with sentiment dictionary,and implement sentiment analysis on microblog corpus,and verify the effectiveness of the sentiment dictionary constructed in this paper in microblog sentiment analysis.Finally,using SVM algorithm,build the Spark platform,and complete the sentiment analysis of Chinese microblog: firstly,use the sentiment dictionary constructed in this paper,write the program,complete the automatic annotation of the training set;secondly,select the feature and feature of the text corpus Weight calculation;Then,SVM model is used to realize Chinese microblog sentiment analysis;finally,the experimental results obtained by the model are compared with the results of Naive Bayes algorithm and the experimental results in single machine mode.The experimental results show that the distributed Spark platform is based on the distributed Spark platform.The Chinese sentiment analysis method can be applied to large-scale data sentiment analysis tasks,and it is feasible to process large-scale text information.
Keywords/Search Tags:Microblog, Sentiment analysis, Big data, Spark
PDF Full Text Request
Related items