Font Size: a A A

The Research On Learning Cross-lingual Word Embeddings Based On Adversarial Training

Posted on:2021-04-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y L LiFull Text:PDF
GTID:2428330614460375Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Cross-lingual word embeddings(CLWEs)aims to learn the word embedding space of resource-few target language with the help of the word embedding space of resourcerich source language.Recently,generative adversarial networks(GANs)have been successfully applied to learn cross-lingual word embeddings in an unsupervised manner.GANs based models of CLWEs take source and target word embeddings as two distributions,and attempt to enforce the source embedding distribution to align the target embedding distribution.This dissertation focuses on cross-lingual word embeddings research based on adversarial training.the main works are as follows:(1)In the process of alignment of cross-lingual word embeddings,abundant targetside information is required as a reliable standard.However,existing GANs based models of CLWEs generally overlook this key,and fails to effectively exploit the target-side information,which may lead to some suboptimal cross-lingual word embeddings.To address this problem,a novel model namely Wasserstein GAN based on autoencoder with back-translation is proposed,which can establish a reliable standard for the alignment process by reusing the target-side information.The proposed model firstly train a Wasserstein GAN based on autoencoder to learn preliminary bidirectional cross-lingual mappings,and then conduct a back-translation with target-side using the obtained bidirectional mappings.Experimental results on three language pairs demonstrate the effectiveness of the proposed model.(2)Compared with high-frequency words,low-frequency words contain poor sematic information.It is more difficult to align the low-frequency words embedding because these low-frequency words will disturb the alignment.To tackle the distributional disturbance caused of embeddings of low-frequency words(LFEs),a perturbed Cramér GAN is proposed for cross-lingual word embeddings learning.The proposed model constructs the perturbed counterpart of LFEs by injecting external perturbations into LFEs,and then jointly trains the perturbed embeddings and the raw embeddings of highfrequency words with Cramér GAN.Experimental results demonstrate that the proposed model can effectively improve the quality of cross-lingual word embeddings.
Keywords/Search Tags:Cross-lingual word embedding, Adversarial training, Wasserstein GAN, Cramér GAN
PDF Full Text Request
Related items