Font Size: a A A

Research Of Sentence-level Sentiment Classification For Text Based On Deep Neural Network

Posted on:2017-12-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y XuFull Text:PDF
GTID:2348330503981834Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the popularity of Web 2.0 applications, instead of merely "reading" the web information, Internet users are empowered to "write" as well. People give their share stories and reviews to produce large amounts of unstructured data. Fully mining these data can generate very valuable information. Traditional machine learning methods often use bag-of-word representations which cannot properly capture more complex linguistic phenomena. Recently, word embeddings have been shown to continuous, dense, can represent the concept of "distance" and capture semantic and syntactic information about words very well so that it has been widely concerned and been used in natural language process tasks. However, since word embedding can only represent word, semantic composition must be considered to represent phrases and sentences. Sentiment analysis of sentences in online reviews is still a challenging task. In recent years, recursive autoencoder(RAE) and recurrent neural network(RNN) methods have been proposed for semantic composition for sentence-level sentiment analysis with good performance, but there are some shortcomings. Aimed at these issues, with users' subjective online reviews as the research object, the main work of this paper includes the following two parts:(1) we propose a method which combines HowNet lexicon to train bidirectional phrase recursive autoencoder(we called it CHL-Bi-PRAE). Previous works tend to generate very deep parse trees so that the complexity of the training is high costs; and need a large amount of labeled data for each node during the learning process; furthermore, RAE methods mainly compose adjacent words or phrases with a greedy strategy, which lead to capturing semantic relations between distant words is difficult. To solve this problems, in this paper we propose a method that constructs the phrase recursive autoencoder(PRAE) model at first, and then calculates the sentiment orientation of each node with HowNet lexicon, which then act as sentiment labels to train softmax classifiers of PRAE(CHL-PRAE). In addition, our CHL-PRAE model conducts bidirectional training to capture global information of sentence, which make representation learning in a fully(CHL-Bi-PRAE). Comparing with RAE and some supervised methods such as support vector machine(SVM) and Na?ve Bayesian on English and Chinese datasets, CHL-Bi-PRAE can provide the best performance for sentence-level sentiment analysis.(2) we propose the long short-term memory over rhetorical structure theory(we called it RST-LSTM). Although previous LSTM network solves the gradient vanishing problem of RNN, but the LSTM structure that has been explored always is a liner chain. Afterwards, Tree-LSTM was put forward and has achieved good results. This illustrates that the LSTM over sequence has stronger reliability on text structure. Based on this, our paper proposes importing rhetorical structure theory(RST) to parse text. We construct the LSTM network on RST parse trees, making full use of the LSTM characteristics. Enables the model to automatically enhances the nuclei information of the text, and filter the satellites information. Furthermore, this approaches make the representations concerning the relationship between segments of text to improve semantic representations.
Keywords/Search Tags:Recursive Autoencoder, LSTM, HowNet Lexicon, Sentiment Analysis, Rhetorical Structure Theory
PDF Full Text Request
Related items