Font Size: a A A

Research On Feature Representation Based On Sentiment Classification

Posted on:2020-01-05Degree:MasterType:Thesis
Country:ChinaCandidate:K R YuFull Text:PDF
GTID:2428330596968165Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,research on neural networks(deep learning)has developed rapidly.Natural language processing,as a practical field,is one of the main targets of neural networks.Although neural networks have achieved great success in the field of natural language processing,demand and development are endless.There still remains many problems even in the most basic tasks such as text classification and sentiment classification.Sentiment classification task is viewed as a basic natural language processing task,while effect of classifier is sensitive to features.The effectiveness of features extracted and used by the classifier can directly influence the performance of the classifier.Distributed representation of features is one of the core researches of neural networks.It is especially important to use distributed representation modeling in natural language processing tasks,for the discrete nature of natural language features such as words.The background of this thesis is a company's needs of public opinion analysis,and this thesis aims to complete a natural language sentiment classification task.For this purpose,we conduct a series of researches on the feature representation in the sentiment classification area.The main contributions of this thesis are as follows:1.For the task of sentiment classification,we proposed an approach to adapt the word representation(word vector).Most existing word vector learning schemes are decoupled from specific natural language processing tasks and are not further adjusted for specific tasks.In this thesis,we propose a concept of word vector sentiment component.Then we attempt to use the sentiment component to interpret sentiment information carried in word vectors.Finally we utilize additional sentiment lexicon to add information to pre-trained word vectors.RT and IMDB sentiment classification tasks are used to test our word vector adapting method.Compared with transferring initial word vectors,transferring the adapted word vectors has effect improvement on multiple models.2.For processing short text,we propose a multi-level short text sentiment classifier.Word sequence features and bag-of-words features are two main features used in text classification.However,collecting multiple features in the same model simultaneously may make the model being too complex and hard to converge.In this thesis,two basic classifiers are designed: LSTM-Attn classifier for collecting sequence features and DAN classifier for collecting bag-of-words features,and then training and integrating multiple basic classifiers with a masked ensembling learning method.The integrated model is able to utilizing multiple features,while avoiding the problem that a single model is difficult to train.The multi-level short text classifier integrates 5 LSTM-Attn classifiers and 5 DAN classifiers,and finally gets an accuracy rate of 86.213.For processing news text,we propose a multi-level news text sentiment classifier.The news text consists of headline text and body text.The news headlines are general and easy to judge with short text sentiment classifier,but the possibility of misjudgment still exists.We propose a long text classifier composed of short text classifiers,which aims to doing sentiment classification on the news body.The multi-level news text sentiment classifier judge the sentiment polarity of the whole news by using both classification results of news headline and news body.The classifier gets an accuracy rate of 93.66% on a news text test dataset provided by the company.
Keywords/Search Tags:representation learning, word vectors, transfer learning, sentiment classification
PDF Full Text Request
Related items