Font Size: a A A

A Shared Semantic Space Approach For Unsupervised Bilingual Lexicon Induction

Posted on:2020-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:X F BaiFull Text:PDF
GTID:2428330590974432Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Unsupervised bilingual lexicon induction aims to generate bilingual lexicons without any cross-lingual signals.Successfully solving this problem would not only facilitate many cross-lingual tasks,but also benefit low-resource languages.Recently,researchers have made great progress on unsupervised bilingual lexicon induction,and the resulted unsupervised bilingual lexicons have been successfully applied to many downstream tasks.However,the existing works have two shortcomings:(1)These models are sub-optimal theoretically and their performance is not ideal.(2)These approaches are not robust in real scenarios,especially among language pairs with large differences.To address these,this paper carries out the following two aspects of work to improve the quality of induced unsupervised bilingual lexicons.Firstly,this thesis propose an unsupervised framework which is based on a shared semantic space,to induce bilingual lexicons.In contrast to existing frameworks which learn a direct cross-lingual mapping of word embeddings from the source language to the target language,we build a shared semantic space for the source and the target language space.In theory,the model based on shared semantic space is more expressive than existing models,thus the word embeddings of the two languages can be better matched,which is helpful to unsupervised bilingual lexicon induction.By conducting extensive experiments across 8 language pairs,we demonstrate that the proposed method significantly outperforms the existing adversarial methods and even achieves best-published results across several language pairs.Secondly,this thesis systematically studies other factors that affect the performance of unsupervised bilingual dictionary induction,based on the proposed shared semantic space model.Although existing works have mentioned these factors,systematic experiments are lacked.By experimenting and analyzing these factors,this thesis further improves the proposed shared semantic space model,making it better in both performance and robustness.
Keywords/Search Tags:Bilingual lexicon induction, Unsupervised learning, Shared semantic space
PDF Full Text Request
Related items