Font Size: a A A

Research On Cross-lingual Word Embedding Construction Methods Based On Deep Semantics

Posted on:2022-06-13Degree:MasterType:Thesis
Country:ChinaCandidate:X Y PengFull Text:PDF
GTID:2518306761496544Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
In recent years,with the emergence of deep learning and neural network technologies,word embedding technology has become an important foundational tool and been applied in many natural language processing tasks.The word embedding can represent the word information as a low-dimensional real value vector to capture the semantic information of the word better.Cross-lingual word embeddings are natural extensions of monolingual word embeddings.We can use them to perform the sematic inference in a multilingual environment and model structure and semantics between different languages with the help of transfer learning techniques.Therefore,how to construct high-quality cross-lingual word embeddings has been a research hotspot in recent years.At present,there are abundant research studies about cross-lingual word embeddings.The most common method among them is to construct cross-lingual word embeddings based on mapping.This method aims to learn cross-lingual word embeddings by mapping two pre-trained monolingual word embeddings into a common semantic space.Due to the differences of languages(different alphabets and dissimilar grammatical structures),the isomorphism assumption is not applicable to all languages.In addition,these mapping methods use pretrained word vectors from Word2 vec to learn cross-lingual word embeddings,and these pretrained word embeddings only considers the local context information of the corpus and neglect the global information.In view of the above problems,the main works of this paper are summarized as follows:1.This paper proposes a cross-lingual word embedding construction method based on topical monolingual word embeddings.Unlike previous studies(only considering local context information),this method both considers the local context information and global information of the corpus.In this method,the potential topics with global information are fused into the improved Word2 vec model to learn high quality monolingual word embeddings.Then,supervised and unsupervised methods are used to train cross-lingual word embeddings with topic information.To verify the effectiveness of the proposed method,experiments are conducted on information retrieval standard evaluation dataset.The results testify that the cross-lingual word embeddings based on the topical monolingual word embedding can fully express the semantics and improve the accuracy of natural language processing tasks.2.On the basis of the above research,this paper further considers the isomorphism hypothesis problem and proposes a cross-lingual word embedding construction method based on enhanced seed bilingual dictionary.Firstly,the monolingual word embeddings of two different languages are pre-trained separately,and then the seed bilingual dictionary is trained and enhanced by using the Triplet loss as the learning objective.Finally,the dictionary is used as a supervised signal to optimize the construction of cross-lingual word embeddings.The experimental results on real datasets demonstrate that our method is superior to other baseline methods and can effectively improve the quality of cross-lingual word embeddings.
Keywords/Search Tags:Cross-lingual word embedding, Deep semantics, Mapping learning, Topic model, Triplet loss
PDF Full Text Request
Related items