Research On Sentiment Classification Based On Cross-lingual Distribute Representation

Posted on:2021-02-22

Degree:Master

Type:Thesis

Country:China

Candidate:C Y Ma

Full Text:PDF

GTID:2428330632462776

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

With the development of the Internet,the distance between people all over the world is getting closer.But different languages are the major factor impeding the communication between people in different regions.The uneven development of natural language processing has also been caused by the same reason.In recent years,cross-lingual technology has been raised to solve this problem.As one of the sub-tasks,cross-lingual sentiment classification means to leverage rich resource in the source language to help target languages with low resources build sentiment classification system.Current cross-lingual sentiment classification methods often need cross-lingual sentiment supervision,such as bilingual sentiment dictionaries,etc.But,for these languages with low resources,constructing the cross-lingual supervision itself is a challenging problem;there are also some methods can directly establish the connection between two languages in an unsupervised way,but the words these methods focus on are often frequent words,such as pronouns and nouns,which do not have a typical sentiment polarity and cannot play an important role in sentiment classification task.To make up for these shortcomings,this study proposed an unsupervised method for constructing cross-lingual sentiment embeddings.This method only requires monolingual word embeddings in two languages and a sentiment lexicon in the source language.The method first uses generation adversarial network to align two embedding spaces,and obtain the mapping matrix between two languages.Then,a self-learning framework is used to further adjust the matrix,making it more sensitive to sentiment implication.Since our sentiment classification method focused on the sentence level,this study explores a variety of sentence representation methods for the cross-lingual sentiment word embeddings we obtained before,including pooling and some recurrent natural network,which commonly used for text encoding,and we proposed a sentence representation structure based on self-attention and pooling,which mean to balance sentence representation ability and calculation time cost.In addition to the method based on cross-lingual sentiment word embeddings,this study further compares the large-scale language model pre-training methods,which perform well on many NLP tasks.For several cross-lingual pretrain models,we do the fine-tune job for cross-lingual sentiment classification.With experiments on some datasets in different categories and different languages,we do sufficient theoretical and experimental analysis.

Keywords/Search Tags:

cross-lingual, sentiment classification, pretrain, distribute representation

PDF Full Text Request

Related items

1	Research On Cross-lingual Sentiment Classification Technology Based On Product Reviews
2	The Key Technologies Of Cross-lingual Aspect Sentiment Classification Towards E-commerce Reviews
3	Research On Cross-lingual Text Sentiment Classification Based On Deep Learning
4	Research Of Cross-lingual Sentiment Classification Method Based On Improved Boosting
5	Research On Approaches For Cross-lingual Sentiment Analysis
6	Application Research Of Crosslingual Sentiment Classification Technology In Product Reviews
7	Research On Sentiment Classification Of Multilingual Network Comments Based On XLM-R
8	Research On Sentiment Analysis For Understanding
9	Cross-lingual Sentiment Analysis Based On Sentiment Transfer
10	Research On Sentiment Classification For Web Reviews