The Research On Learning Cross-lingual Word Embeddings Based On Adversarial Training

Posted on:2021-04-26

Degree:Master

Type:Thesis

Country:China

Candidate:Y L Li

Full Text:PDF

GTID:2428330614460375

Subject:Computer application technology

Abstract/Summary:

Cross-lingual word embeddings(CLWEs)aims to learn the word embedding space of resource-few target language with the help of the word embedding space of resourcerich source language.Recently,generative adversarial networks(GANs)have been successfully applied to learn cross-lingual word embeddings in an unsupervised manner.GANs based models of CLWEs take source and target word embeddings as two distributions,and attempt to enforce the source embedding distribution to align the target embedding distribution.This dissertation focuses on cross-lingual word embeddings research based on adversarial training.the main works are as follows:(1)In the process of alignment of cross-lingual word embeddings,abundant targetside information is required as a reliable standard.However,existing GANs based models of CLWEs generally overlook this key,and fails to effectively exploit the target-side information,which may lead to some suboptimal cross-lingual word embeddings.To address this problem,a novel model namely Wasserstein GAN based on autoencoder with back-translation is proposed,which can establish a reliable standard for the alignment process by reusing the target-side information.The proposed model firstly train a Wasserstein GAN based on autoencoder to learn preliminary bidirectional cross-lingual mappings,and then conduct a back-translation with target-side using the obtained bidirectional mappings.Experimental results on three language pairs demonstrate the effectiveness of the proposed model.(2)Compared with high-frequency words,low-frequency words contain poor sematic information.It is more difficult to align the low-frequency words embedding because these low-frequency words will disturb the alignment.To tackle the distributional disturbance caused of embeddings of low-frequency words(LFEs),a perturbed Cramér GAN is proposed for cross-lingual word embeddings learning.The proposed model constructs the perturbed counterpart of LFEs by injecting external perturbations into LFEs,and then jointly trains the perturbed embeddings and the raw embeddings of highfrequency words with Cramér GAN.Experimental results demonstrate that the proposed model can effectively improve the quality of cross-lingual word embeddings.

Keywords/Search Tags:

Cross-lingual word embedding, Adversarial training, Wasserstein GAN, Cramér GAN

Related items

1	Research On Machine Reading Comprehension Model Based On Cross-lingual Transfer Technology
2	Cross-Lingual Text Classification Based On Monolingual Word Embedding Mapping Without Parallel Corpus
3	Research On Cross-lingual Word Embedding Construction Methods Based On Deep Semantics
4	Research On Unsupervised Cross-lingual Word Embedding Model Based On Feedback System
5	Research On Mongolian-Chinese Cross-Lingual Word Embedding Learning Based On BERT
6	Research On Cross-lingual Word Similarity Computation
7	Research On Unsupervised Cross-lingual Mappings Of Word Embeddings
8	Research On Chinese-korean Cross-lingual Text Classification Method Based On Bilingual Topical Word Embedding Model
9	Unsupervised Cross-lingual Word Representation Learning Method Based On Co-training
10	The Research On Cross-lingual Speaker Recognition Based On Language-adversarial Training