Font Size: a A A

Research On Key Techniques Of Sentiment Analysis Based On Representation Learning

Posted on:2018-09-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:X WangFull Text:PDF
GTID:1318330536981160Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Text sentiment analysis technology can be used to explore the emotional information contained in the Internet text,so that we can effectively understand the views of people on an entity,and make decisions based on these views.However,the characteristics of the text in the Internet application challenge the traditional emotion analysis technology.First of all,the number of unlabeled text is getting larger,but labelled data is always scarce.How to improve the performance of sentiment analysis method by using unlabeled data becomes a problem to be solved.Moreover,the text becomes concise and irregular,which makes the traditional features based on bag-of-word face the problem of sparsity.In addition,Internet products for different application scenarios emerge in an endless stream.Targeted feature engineering is time-consuming.Thus,it is difficult to adapt to the needs of rapid iterative analysis of the product.The word embeddings(also known as distributional representation of word)can be obtained after unsupervised training on a large-scale data,which can make use of unlabeled data effectively.Similar words have similar word embeddings,thus,such feature could smooth the models effectively and alleviate the sparsity problem.Deep models based on neural network can automatically learn the composition and abstraction of word vectors.In this way,representation learning based on deep neural network and word vectors gain the potential of solving a series of challenges in sentiment analysis.In this work,we mainly study how to use the representation learning techniques to solve some key problems in emotion analysis.Specifically,we study not only learning to represent sentence,sequence of sentences and word context with recurrent neural network with gates operation,but also improving word embeddings.We also leverage these technologies in the task of sentiment polarity classification of short text,opinion expression extraction,aspect extraction and emotion prediction in multi-rounds dialogue.The main contents of this paper include the following four aspects.Aiming at the feature sparseness of polarity classification caused by the expression diversity and poor standardization of short text,in this thesis,sentiment semantic representation learning approach based on recurrent model with gates is proposed.Experiments show that this method can effectively identify the polarity category of text.In addition,this work has studied the change of the word vectors in the process of network training.Taking he internal structure of the long short-term memory unit into consideration,we have carried on the discussion to the cooperation between the unit and the vectors,showing the interaction mechanism between the words.There are many kinds of emotional expression in the text and it is difficult to sum up the common features.The implicit expression of emotion is often not containing emotional words and not easy to be covered by the traditional bag-of-word features,while the existing representation learning methods lack of flexibility.In order to solve this problem,we proposes a long short-term memory network based method to learn the abstract semantic representation of the word and to recognize and extract the opinion expression.The experimental results show that the performance of sentiment extraction can be improved effectively by introducing the bidirectional network structure.In addition,this paper also studies the characteristics of the long short-term memory network,which is capable of signal separation and information selection.The relationship between the aspect of the opinion target and the word itself is very close,so the quality of the word vector directly affects the performance of the extraction.There are several problems in the word vector itself: the gap between the representation and the function of the word,the lack of statistical information,and the ambiguity of meaning and function.In order to solve these problems,we introduces the word vector based on dependency syntax,introduces extension based on the outer product matrix and proposes the specialization based on specialized input gate.Experiments show that these methods can effectively improve the word vector and improve the performance of attribute extraction.Discovery the negative emotions in multi-round conversations can provide the basis for the evaluation and improvement of the dialogue technology.However,the existing human-computer interaction makes it difficult for us to obtain the negative feedback directly.It is a way to get the information by predicting the user's emotion in the context of multi round conversations.These are many possible factors affecting the user's emotions,and the hypothesis are made based on known sentence dialogue.Deep neural networks are constructed to learn to represent such hypothesis and predict the user's emotional feedback.This work has filled the blank of this field.The experimental results show that the method based on the convolution recurrent neural network can not only effectively represent the sequence of text and relation in the multi-round dialogue,but also predict the user emotion effectively.
Keywords/Search Tags:sentiment analysis, representation learning, word embedding, deep neural network
PDF Full Text Request
Related items