Font Size: a A A

Emotion Classification Based On Microblog Text Data

Posted on:2020-06-06Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhangFull Text:PDF
GTID:2428330578980899Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the tremendous growth of the Internet,more and more people attend to share their opinions and feelings on social media,such as Twitter and Weibo.Emotion analysis is a research hotspot in natural language processing and it allows us to identify the feelings of individuals through analyzing the text they have posted.Emotion classification is a funda-mental task which aims to determine the emotion categories of the text,such as happy and angry.This dissertation mainly focuses on the research of emotion classification based on microblog text data,aiming at solving the lack of the annotated data in this task.Research details are described in the following three aspects:Firstly,this dissertation presents a weakly-supervised emotion classification scenario where emoticon information in the unlabeled data is leveraged to obtain a large amount of automatically-annotated data.Furthermore,a joint learning approach is proposed to incor-porate both human-annotated data and auto-annotated data.Specially,this approach con-siders the emotion classification task with human-annotated data as the main task and the emotion classification task with auto-annotated data as the auxiliary task.The main idea of the proposed approach is to employ the auxiliary representation learned from the auxiliary task to assist the performance of the main task.Empirical studies demonstrate the effec-tiveness of leveraging auto-annotated data and the joint learning approach achieves better performance than simply merging these two data sets.Secondly,this dissertation proposes an approach to learning emotion-specific word embedding for emotion classification.Many intuitive choices of learning word embedding are available,but these word embedding algorithms model the syntactic context of words without considering the emotion information relevant to words.As a result,words with opposite emotion but similar syntactic context tend to be represented as close vectors.The main idea of the proposed approach is to integrate the relationship between words and emoticons to help learn emotion-specific word embedding.Specially,a heterogeneous network composed of a word-document network and a word-emoticon network is con-structed.Once obtaining emotion-specific word embedding,a long short-term memory network is trained to perform emotion classification.Empirical studies demonstrate that this approach outperforms both the straightforward baseline and some conventional word embedding algorithms.Finally,this dissertation proposes a novel approach to emotion classification based on cross-lingual information.This approach leverages rich English emotion corpus to improve the performance of emotion classification in Chinese microblog corpus from a cross-lingual view.First,the Chinese microblog corpus is translated into English and the English Twitter corpus is translated into Chinese by translation tools.Then,a multi-task learning framework is used to learn both Chinese and English emotion corpora.The Chi-nese microblog corpus is regarded as the original corpus and the others are regarded as supplementary corpora.The intermediate representations based on the supplementary cor-pora are fused with the intermediate representation based on the original corpus.On the basis,an attention mechanism is added to obtain the final classification results.Empirical studies demonstrate that this approach can improve the performance of emotion classifica-tion significantly.
Keywords/Search Tags:Microblog, Emotion Classification, Weakly-supervised, Emotion-specific Word Embedding, Multi-task Learning
PDF Full Text Request
Related items