Font Size: a A A

The Key Technologies Of Cross-lingual Aspect Sentiment Classification Towards E-commerce Reviews

Posted on:2022-10-23Degree:MasterType:Thesis
Country:ChinaCandidate:Z K WangFull Text:PDF
GTID:2518306740982989Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology and global e-commerce business,cross-border e-commerce platforms such as Amazon,Tmall Global,Global Shopping and so on have become indispensable shopping way for people,and hundreds of millions of consumers review goods and services and share their shopping experiences.Leveraging Natural Language Pro-cessing(NLP)to conduct sentiment analysis on the comment text containing user emotion is of great reference value for consumers and producers.However,the sentiment corpus resources of different languages are not evenly distributed in quantity and quality,and manual annotation is time-consuming and laborious,which makes it impossible to classify sentiment polarity for the languages with insufficient resources.Cross-lingual sentiment classification task,which is to use one language with rich corpus to assist another language with poor corpus for senti-ment classification,is a hot research topic in the field of NLP.However,the existing research on cross-lingual sentiment classification focuses on document level or sentence level,while the research on cross-lingual aspect level sentiment classification is neglected.To solve this problem,this thesis carried out research on key technologies of cross-lingual aspect sentiment classification for e-commerce reviews.The specific research contents are as follows:Firstly,we propose a cross-lingual aspect sentiment classification method based on transla-tion matching.This method leverages machine translation tools to translate the source language corpus into the target language,and uses multi-head self-attention layer and adaptive merge layer to model aspect-level representation for cross-lingual aspect sentiment classification.To solve the problem of domain shift caused by machine translation tools,a domain adaptation method called translation matching is used to reduce the distribution shift between translated languages and native languages.In addition,the polarity of aspect sentiment depends on a group of words or phrases related to the aspect and is independent of the overall expression of the sentence,this thesis firstly uses the multi-head self-attention layer to model the fine-grained interaction between aspect and sentence in the target translation corpus,and then obtains the aspect-level vector representation through the adaptive merge layer to make full use of the ef-fective aspect sentiment information and improve the performance of cross-lingual aspect sen-timent classification.The experimental results show that the proposed method achieves better performance than other baselines in the cross-lingual aspect sentiment classification task,and the ablation studies further verify the effectiveness of the proposed method.Secondly,we propose a cross-lingual aspect sentiment classification method based on rein-forced distillation.This method achieves cross-lingual aspect sentiment classification based on the knowledge distillation framework and token selection mechanism.Specifically,the method first trains a source language classifier with great performance as a teacher network,which freezes its parameters in the training phase and provides a softened aspect sentiment polarity distribution for the target language classifier,i.e.,the student network,to achieve cross-lingual knowledge distillation.Secondly,compared with native language,the translation results ob-tained by machine translation emerges unusual words and translation ambiguities.These trans-lated noisy words have no reference value for aspect sentiment,and increase the difficulty for the target language classifier to model fine-grained interaction between aspects and sentences.Therefore,this thesis proposes an aspect-sensitive token selector for filtering noise words in translated sequences,which is trained by reinforcement learning algorithm.Experimental re-sults show that the proposed method can effectively eliminate the problem of noisy words in translated sentences,and the target classifier generates higher quality aspect-level vector repre-sentation and achieves better generalization ability in the framework of cross-language distilla-tion.Finally,this thesis proposes a cross-lingual aspect sentiment classification method based on cross-lingual pretraining.This method proposes two specific cross-lingual pretraining task to enhance cross-lingual alignment ability of the pretraining model,then the pretrained model can map the source language and target language into the same vector space,and achieve cross-lingual aspect sentiment classification without any external corpus resources and machine trans-lation tools.In the pretraining stage,this thesis proposes a new method based on < source lan-guage,target translation > sentence-pair to improve the cross-lingual alignment ability of multi-lingual BERT.Specifically,the Masked Language Model(MLM)task encourages the model to align source language and target translation representations to enhance the ability of word-level semantic alignment and cross-lingual representation,and the Next Sentence Prediction(NSP)task predicts whether the source language sentence and the target language sentence in each sen-tence pair expresses the same semantics to enhance the ability of sentence-level cross-lingual representation.In the fine-tuning stage,the source corpus with aspect-level annotation infor-mation is used to fine-tune model and partial model parameters are frozen to avoid catastrophic forgetting.Experimental results show that the proposed method significantly improves the per-formance of cross-lingual aspect sentiment classification through two-stage training method,and achieves same or even better performance than the cross-lingual approach based on ma-chine translation in zero-resource scenarios.
Keywords/Search Tags:Cross-lingual Aspect Sentiment Classification, Domain Adaption, Attention Mechanism, Knowledge Distillation, Cross-lingual Pretraining
PDF Full Text Request
Related items