Font Size: a A A

Text Sentiment Analysis Based On Distributed Representation Learning

Posted on:2019-11-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:T ChenFull Text:PDF
GTID:1368330566997554Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Sentiment analysis is one of the hot research topics in natural language processing.It aims to make the computer automatically recognizing,classifying,annotating or extracting the human sentiment,emotion,opinion or evaluation to specific target or topic from natural language.Most traditional sentiment analysis approach used shallow representation,such as Bag-Of-Words(BOW)representation,which suffers from the problem of data sparse and missing original semantic information in the text.Distributed representation of text usually refers to a set of low-dimensional,continuous,dense vectors,generated by neural network models to represent text.In recent years,distributed representation learning has been widely employed in the field of natural language processing.It can not only overcome the curse of dimensionality caused by the sparse data representation methods but also can use semantic correlation and layer-by-layer composition features of distributed representation to improve the performance of sentiment analysis task.It has opened up a new way for the research of sentiment analysis.In this dissertation,we focus on the study of text sentiment analysis based on distributed representation learning.The main contents of our works include the following four aspects:First,sentence-level sentiment analysis based on opinion target related sentence type classification is investigated.Traditional sentiment classification research focuses on a one-technique-fits-all solution.The observation on sentiment corpus shows that there is a close connection between the number of opinion targets expressed in a sentence and the difficulty of sentiment classification.To this end,based on the divide and conquer strategy,we propose a sentence-level sentiment analysis approach based on opinion target related sentence type classification.We first extract target expressions in opinionated sentences through Bi LSTM-CRF based neural sequence annotation model.Then we classify the sentences into three types according to the number of opinion targets it contains.Finally,we optimize three sentiment classifier separately corresponding to the sentences in each type for improving the overall performance of sentiment classification.Experimental results on four sentence level English sentiment analysis datasets show that the proposed method outperforms many comparative methods,and improves the performance of sentiment analysis.Second,sentence-level sentiment analysis based on distributed representation oversampling for imbalanced data is investigated.It is commonly found in the sentiment corpus that the proportion of different types of samples in the training corpus is imbalance.Imbalanced training data can leads to the classified bias to the supervised machine learning based classifiers.Oversampling is a commonly used method to handle the imbalanced data problem.However,traditional BOW representation based oversampling has the small disjunct problem,in which the newly generated samples may be more similar the characteristics of other classes of samples which affects the performance of the classifier.Considering that distributed representation has the intrinsic characteristics of semantic correlation,and the representation vector in the same category has better cohesiveness,we propose a sentence-level sentiment analysis approach based on distributed representation oversampling for imbalanced data.We first construct distributed representations for sentiment corpus.We conduct oversampling to generate new samples on the distributed representation space for the minority classes in the imbalanced training data.Finally,the classifier is trained by using the balanced data.Experimental results on single label binary classification English sentiment dataset and multi-label multi-class Chinese emotion dataset show that the new samples generated overcome the small disjunct problem,and improve the performance of the supervised machine learning method on the real world imbalanced training data.Meanwhile,incorporating this method with opinion target related sentence type classification based sentence-level sentiment analysis method,we can further improve the performance of text sentiment analysis.Third,document-level sentiment analysis based on the layer-by-layer composition of distributed representation is investigated.Hierarchical structure naturally exists in natural language.From the word level to the document level,the complexity of the semantics of the text gradually grows.The similar situation can be found in the deep neural networks,which also have the feature of the layer-by-layer composition.Most existing layer-by-layer composition of distributed representation based approaches suffer from the problems such as complex network structure,hard training,and unable to make use of sentence-level task-specific annotation information.Considering these issues,we propose a two-step semantic composition neural network approach,which uses a supervised deep neural network to generate sentence-level and document-level distributed representation respectively.The distributed representations of sentiment review documents can be used for document-level sentiment analysis.Experimental results on three large-scale review document datasets show that the proposed method not only improve the performance of document-level sentiment analysis,but also reduce the difficulty of neural network training with the effective use sentence-level sentiment annotation information.Fourth,multi-document sentiment analysis using user and product distributed representation learning is investigated.It is often observed that a lenient user might give the higher rating than a critical user even if they post an(almost)identical review,while popular products are likely to receive more praises than less popular ones.Many existing methods study the user and product preferences modeling for generating the personalized distributed representation of user and product.These methods regard the reviews from the same user or for the same product as individual text,but ignoring the temporal relationship between the reviews.Therefore,based on the sentence level and document level text of distributed representation learning,we propose a kind of multi-document sentiment analysis method by using user and product distributed representation learning,which embeds temporal relations of reviews into the categories of distributed user and products representations learning for the sentiment classification.We first use layer-by-layer composition approach to generate distributed representations of review documents.Then a recurrent neural network with gated recurrent unit is used to modeling the temporal relationship among the reviews.Finally,a machine learning based classifier is applied to classify the stitching vectors of the distributed representation of users,products,and reviews.Experimental results on three large-scale review document datasets show that the proposed method improve the performance of text sentiment analysis effectively.In summary,in this dissertation we focus on text sentiment analysis based on distributed representation learning.The optimization approaches of text sentiment analysis are investigated by using the intrinsic characteristics of distributed representation and the features of sentiment analysis task.We improve the performance of sentiment analysis from three levels: sentence level,document level and multi-document level distributed representation learning.We hope that our research are helpful to promote the further development of sentiment analysis.
Keywords/Search Tags:Sentiment Analysis, Distributed Representation Learning, Distributed Representation Oversampling, Opinion Target Extraction, Layer-by-layer Composition
PDF Full Text Request
Related items