Font Size: a A A

Research On Word Vector-based Sentiment Classification

Posted on:2019-05-31Degree:MasterType:Thesis
Country:ChinaCandidate:P ChenFull Text:PDF
GTID:2348330542491151Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Sentiment analysis refers to recognizing sentiment information by mining and analyzing the content expressed in the Internet texts so that we can understand users'opinions on a certain product efficiently and provide decision support for businesses and other users further.However,the characteristics of the text of today's Internet pose a huge challenge to sentiment analysis:First,Internet texts increase explosively,mostly with unlabeled text data.Moreover,the emotional expression of texts is more concise and informal.This appears a serious problem of sparsity in the traditional bag characters.In addition,the manual extraction of features is time consuming and has poor system applicability,so it is difficult to adapt to the rapidly updated text analysis needs.In recent years,researchers have begun to study word vector-based methods to automatically extract text features.Word vectors,the distributed representation of words,can be obtained through unsupervised training by effectively using a large number of unlabeled data.Similar words learn similar word vectors,which can play a smooth role as a feature to effectively alleviate the problem of sparsity.However,traditional word-vector learning model itself has some problems:it is based on context learning,which captures the semantic and grammatical information of text,but ignores the sentiment information and thus can not be effectively applied to the task of sentiment analysis.Moreover,in sentence-level and document-level sentiment analysis tasks,word-vector based text feature representation does not consider word order information in sentences,which will also affect the final sentiment classification effect.In order to solve some problems in the task of sentiment classification,the main research contribution in this paper are as follows:In order to integrate sentiment information into the learning process of word vector,this paper proposes a vector learning framework based on Glove model,which realizes sentiment information of words embedded in the training process.According to the different measure of distance between vectors,this paper uses two different sentiment information fusion methods to construct the word vector learning model.In order to demonstrate that the learned vector does capture the semantic and emotional information of the text,this paper conducts qualitative and quantitative comparative experiments under the Chinese and English datasets.The experimental results show that the proposed word vector learning model can effectively improve the quality of the word vector,and thus improve the accuracy of the sentiment classification.In order to eliminate the influence of word order information on sentiment classification tasks,this paper proposes a textual sentiment classification model which combines the sentiment word vector and the convolution neural network.First,we use the learned vector to construct the input matrix of all words in the text,and then extractthe local features of the different granularity through convolution kernel with different size.Finally,we get the fixed length feature of the text through the maximum pooling strategy,to classify the text sentiment.We conducted a comparative experiment with different granularity(word level,document level)on Chinese and English datasets.The result justified the effectiveness of our word vector model in leveraging the textual sentiment information and semantic information,and thus better handle the task of sentiment analysis,and our model has better generalization.
Keywords/Search Tags:Word Vector, Sentiment analysis, Machine Learning, Sentiment Information, Convolution Neural Network
PDF Full Text Request
Related items