Font Size: a A A

Research On Sentiment Classification Based On Cross-lingual Distribute Representation

Posted on:2021-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:C Y MaFull Text:PDF
GTID:2428330632462776Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the development of the Internet,the distance between people all over the world is getting closer.But different languages are the major factor impeding the communication between people in different regions.The uneven development of natural language processing has also been caused by the same reason.In recent years,cross-lingual technology has been raised to solve this problem.As one of the sub-tasks,cross-lingual sentiment classification means to leverage rich resource in the source language to help target languages with low resources build sentiment classification system.Current cross-lingual sentiment classification methods often need cross-lingual sentiment supervision,such as bilingual sentiment dictionaries,etc.But,for these languages with low resources,constructing the cross-lingual supervision itself is a challenging problem;there are also some methods can directly establish the connection between two languages in an unsupervised way,but the words these methods focus on are often frequent words,such as pronouns and nouns,which do not have a typical sentiment polarity and cannot play an important role in sentiment classification task.To make up for these shortcomings,this study proposed an unsupervised method for constructing cross-lingual sentiment embeddings.This method only requires monolingual word embeddings in two languages and a sentiment lexicon in the source language.The method first uses generation adversarial network to align two embedding spaces,and obtain the mapping matrix between two languages.Then,a self-learning framework is used to further adjust the matrix,making it more sensitive to sentiment implication.Since our sentiment classification method focused on the sentence level,this study explores a variety of sentence representation methods for the cross-lingual sentiment word embeddings we obtained before,including pooling and some recurrent natural network,which commonly used for text encoding,and we proposed a sentence representation structure based on self-attention and pooling,which mean to balance sentence representation ability and calculation time cost.In addition to the method based on cross-lingual sentiment word embeddings,this study further compares the large-scale language model pre-training methods,which perform well on many NLP tasks.For several cross-lingual pretrain models,we do the fine-tune job for cross-lingual sentiment classification.With experiments on some datasets in different categories and different languages,we do sufficient theoretical and experimental analysis.
Keywords/Search Tags:cross-lingual, sentiment classification, pretrain, distribute representation
PDF Full Text Request
Related items