Font Size: a A A

Research On Micoblog Sentiment Classification

Posted on:2017-04-19Degree:MasterType:Thesis
Country:ChinaCandidate:S ChenFull Text:PDF
GTID:2308330482995036Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Text Sentiment Classification(public opinion analysis) is one of important branch of text mining, in recent years, it has get more and more attention, many people begin to study it. With the development of microblog, short text sentiment analysis is getting hot and hot. The text of microblog has a large number of unknown words, new words, but these words have not being added to the emotional polar affective vocabulary, which will greatly affect our classification results. The automatic expansion of our emotions dictionary is one of the important points when we do research on microblog. The traditional sentiment classification model based on emotions dictionary is easy to achieve and its classification speed is good, but its accuracy is just so so, and when we construct accurate emotions dictionary we need to have some background knowledge of the language, these requirements of background knowledge will prevent us for doing research on microblog sentiment classification.In order to solve the problem polarity judgment microblog unknown words, new words, we propose a point of mutual information and information retrieval algorithm combining automatic extension emotion dictionary. For lower classification accuracy of the traditional model, the requirements of background knowledge and so on, based on the depth of learning to build a microblog text sentiment classification model, and proposed combining the traditional model and deep learning model algorithm.Details are as follows:First, the new word recognition, the automatically expanded of microblogging emotion dictionary.In order to solve the traditional model of dictionaries automatically extended emotional issue, use the PMI-IR algorithm(point mutual information and information retrieval method of combining) to identify unknown words microblog, new words, and then update them to the dictionary.Second, build emotional lexicon and then develop appropriate discrimination rules based on emotion built dictionary to classify text.The dictionary is divided into four parts: basic emotions dictionary(negative emotion dictionary and positive emotion dictionary), degree adverb dictionary, conjunction dictionary and negative dictionary. We collected a few mainstream emotional dictionary on the web: Taiwan University NTUSD- Simplified Chinese dictionaries emotional polarity, hownet emotional lexicon and Chinese emotionvocabulary ontology Dalian University of Technology. These emotional dictionaries are integrated together to remove the repeated vocabulary, but also a considerable part of the vocabulary of the correction and optimization.Third, constructed the text depth learning models, using traditional models to collect the training corpus of depth model.The traditional model for the lower classification accuracy, built microblog sentiment classification model based on LSTM which is one of the depth learning models(short and long term memory model) classification, because of this model is to oversee the training,we need to collect a large number of categories has been divided corpus, so we made good use of traditional classification model to determine the set as a training corpus deep learning model, combining two kinds of models to help us improve the classification accuracy.
Keywords/Search Tags:Microblog Text Sentiment Classification, Emotion Dictionary, Deep learning, LSTM, PMI-IR
PDF Full Text Request
Related items