Font Size: a A A

Research And Design Of Public Sentiment Analysis System Based On Doc2vec And SVM

Posted on:2018-05-16Degree:MasterType:Thesis
Country:ChinaCandidate:R Y GanFull Text:PDF
GTID:2348330518993310Subject:Information security
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, people are inclined to express their feelings and attitudes on the websites. Micro-blog, forum,post bar, mobile news and other new media networks have successively come into the surface. The comments people poster in these new media indicate their complex and colorful sentiments, which plays a key role in analyzing the soring growth of network public opinion in the short term.Thanks to the characteristics of public opinion, like its rapid formation,widespread, large scale, and explosive growth, the government can make use of the computer technology like Natural Language Processing and Machine Learning to analyze sentiments on these massive texts. That will help to improve the network public opinion on its monitoring, analysis,early warning and guide ability, and at the same time helps to build a healthy and harmonious environment for network public opinion.In this paper, based on the word2vec model and doc2vec model, a method of text feature extraction is proposed. This paper mainly includes the method of preprocessing the text data, extracting the initial features based on the doc2vec model, generating the emotion dictionary based on the word2vec model,generating the new features based on the sentiment dictionary, combining the initial features and the new features to generate the final features of the text. The support vector machine (RBF kernel function) is selected as the classifier, and the opinion analysis system is designed and designed, and the results of F1 =0.89 and AUC = 0.95 are obtained.The main work of this paper has the following aspects:1. Introduce and contrast the traditional vector space model,the probabilistic subject model and the word vector model of the Distributed representation. Then we introduce the word2vec model and the doc2vec model developed from the word vector model of the Distributed representation. This paper introduces the principle of the four mainstream classification models of Logistic regression algorithm, random forest algorithm, decision tree algorithm and support vector machine algorithm.2. On the basis of word2vec model and doc2vec model, this paper proposes a method of text feature extraction. The study of how to pre-treat public opinion text data, including the punctuation, pause,negative words, digital processing. This paper studies how to extract the initial features of the text based on the doc2vec model, generate the sentiment dictionary based on the word2vec model, extract the new features based on the sentiment dictionary, and get the final text feature by combining the features.3. The function and related technology of six modules of data analysis module, data processing module, feature extraction module,classification algorithm module and UI interactive module are expounded and analyzed by modular design.4. The test environment is set up, the system is tested, the performance of the system and the effect of opinion emotion classification has evaluated, and the feature extraction and classification model has optimized. The method of this system has compared with the original method. It verifies the validity of the method proposed in this paper and can obtain the better results of the public opinoin sentiment classification.
Keywords/Search Tags:public opinion monitoring, sentiment analysis, word vector, doc2vec, SVM
PDF Full Text Request
Related items