Unsupervised Cross-lingual Word Representation Learning Method Based On Co-training

Posted on:2022-08-27

Degree:Master

Type:Thesis

Country:China

Candidate:Z C Su

Full Text:PDF

GTID:2518306572959899

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Cross-lingual word embedding means that the corresponding embeddings of words from different languages are in the same vector space,so that the similarity between the words from different languages can be measured easily.Unsupervised cross-lingual word representation learning aims to learn cross-lingual word embeddings without any external cross-lingual information.Although existing unsupervised cross-lingual word representation learning methods have achieved certain results,there still exists many shortcomings.One of the disadvantages is that the bilingual translation dictionary acquisition method in the self-learning step is too simple,so that It can't provide high-confidence bilingual information for the subsequent iterative steps,which affects the effect of the self-learning process and finally has a negative impact on the performance of cross-lingual word embeddings.To solve this problem,this paper proposes an unsupervised cross-language word representation learning method based on co-training,so as to increase the quality of cross-lingual word representations.This paper attempts to compare the bilingual translation dictionaries used in selflearning steps of different training sub-processes in the co-training process,and select more credible bilingual translation pairs for subsequent training steps of each process,thereby improving the quality of information used in the training process and ultimately improving the model's performance.Specifically,this paper designs an unsupervised cross-language word representation co-training method based on different word embedding models and another co-training method based on different corpus sources,and both of them perform better than the baseline model.This paper also explores the principal component analysis method based on the linear autoencoder,and realizes the principal component acquisition method based on the linear autoencoder for the pointwise mutual information matrix obtained on the monolingual corpus.On this basis,a cross-lingual word representation co-training method based on linear autoencoder is designed,which improves the effect of crosslingual word embedding learing,and further verify the feasibility of the co-training method of unsupervised cross-lingual word representation learning.

Keywords/Search Tags:

cross-lingual word representation, co-training, unsupervised learning

PDF Full Text Request

Related items

1	Research On Unsupervised Cross-lingual Mappings Of Word Embeddings
2	The Research On Learning Cross-lingual Word Embeddings Based On Adversarial Training
3	Research On Unsupervised Cross-lingual Word Embedding Model Based On Feedback System
4	Research On Machine Reading Comprehension Model Based On Cross-lingual Transfer Technology
5	Bilingual Word Representation Learning From Non-parallel Corpora
6	Research On Cross-lingual Word Similarity Computation
7	Research On Unsupervised Neural Machine Translation
8	Research On Cross-lingual Word Embedding Construction Methods Based On Deep Semantics
9	Research On Unsupervised Named Entity Recognition Based On Cross-lingual Transfer
10	Research On Mongolian-Chinese Cross-Lingual Word Embedding Learning Based On BERT