Font Size: a A A

Short Text Sentiment Classification Based On Deep Learning

Posted on:2019-11-07Degree:MasterType:Thesis
Country:ChinaCandidate:L D HaoFull Text:PDF
GTID:2428330566960752Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,massive comment data has been generated on the Internet.The data has the characteristics of short space,colloquialism and sparse features.Sentiment analysis plays an important role for individuals,businesses,and countries.Traditional research tends to sentiment classification by combination of machine learning and natural language features.However,the feature selection requires the knowledge of the relevant experts which generates labor costs.In recent years,the development of deep learning has provided a new sight for the sentimental classification where the automatic feature extraction of deep learning attracted many researchers.After learning and summarizing the text representation and the common models of deep learning,we do a deep research on improvement of convolutional neural network(CNN)and Long Short-term Memory(LSTM)model.Finally,Multi-Mixed Convolutional Neural Network(MMCNN)and the Subject-oriented Attention-based LSTM(SA-LSTM)are proposed.Based on real comment data sets,the effectiveness of these two models for short text sentiment classification was verified through experiments.Here are the main contributions of the paper:1.Aiming at the slow crawling of crawler and crawler banned by anti-reptile programs.We design a multi-machine distributed crawling architecture,which reduces the access of the same IP to the server by distributing crawlers to multiple machines.Otherwise,the crawler has strong stability by setting sleep time and adding an exception handling mechanism.2.Aiming at training problems caused by the high-dimensional vector,the word vectors trained by the Word Embedding solve the problems of "dimension disaster"and "break the association between words".In this paper,the cycle word vector filling method and the random word vector filling method are designed to fill the comment word vector matrix,which solves the sparseness problem of the word vector matrix caused by the different length of the comments.3.Aiming at the problem of standard convolution neural network's feature extraction is simple.In this paper,a MMCNN is proposed.Unlike deepening the network structure,the model is designed by widening convolutional neural network structure.We combined multi-channel convolutional features and pooling layer's features to enrich the way of feature extraction which enhance the short text classification accuracy of the network.4.Aiming at the difficulty of classifying where subject is inconformity in the comment data.In this paper,we designed subject discriminant algorithm.The algorithm identifies subject by integrating historical corpus information,the context information and the field information.We identify the subject of each short sentence in comment data,and then generates the subject word vector5.Aiming at the key information can't be noticed in LSTM where the input sequence has the same influence,we designed SA-LSTM which focuses on the subject.Hidden layer of LSTM and the subject word vector are combined to add attention layer where calculates the probability distribution of input sequence,which effectively highlights the influence of keywords and improves the performance of the LSTM model for short text classification.In this paper,MMCNN and SA-LSTM are proposed.Two algorithms overcome the dependence of unsupervised learning on dictionaries and the supervised learning on features.It has implications for short text sentiment classification.
Keywords/Search Tags:Sentimental Classification, Word Vector, Convolutional Neural Network, Attention mechanism, Long Short-term Memory
PDF Full Text Request
Related items