Cross-Lingual Text Classification Based On Monolingual Word Embedding Mapping Without Parallel Corpus

Posted on:2020-11-06

Degree:Master

Type:Thesis

Country:China

Candidate:N Wang

Full Text:PDF

GTID:2428330575989342

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Now text classification is a relatively common model in natural language,but the model cannot be used on different languages platforms due to the uniformity of the word.If the text classification model was trained separately on each language,it would spend lots of cost and time;on the other hand,the classification model,as well as a supervised learning method,requires a large number of training samples.Some Mono-lingual text classification model may failed owning to its low resource.In addition,main cross-language word embedding models always depend on costly parallel corpora that makes model cannot transfer between different languages.In response to solving the above problems,this thesis conducts in-depth research on text classification,cross-lingual word embedding,etc.,and proposes a monolingual neural network classification model with attention mechanism,two cross-lingual text classification with non-parallel corpus,as following:(1)For the monolingual text classification model,this thesis proposes a bi-directional GRU neural network model and introduces the attention mechanism into the text classification model.Compare with traditional machine learning methods,the bi-directional GRU text classification model with the attention mechanism have different degrees of improvement for the classification model.Therefore,this model is also used as the text classification model for cross-lingual research.(2)For cross-lingual text classification model using word embedding mapping with non-parallel corpora,this thesis proposes to use two monolingual word embedding to construct a bilingual word embedding model,Based on the current research on adversarial learning,the Procrustes analysis method and the Cross-domain Similarity Local Scaling(CSLS)are introduced to fine-tune the results obtained from the adversarial learning,so that words representations in bilingual word embedding could as close as possible.Secondly,in this thesis,the Procrustes analysis method and the method of cross-domain similarity local scaling(CSLS)are also used in the self-learning training process to continuously adjust the mapping matrix and finally reach the convergence completion training.And the comparative experimental results of BilBOWA,Google translation and w/o CSLS show that the orthogonal constraint and the method of CSLS all improve the performance of the classification model,and using both orthogonal constraint and CSLS has the best results.Self-learning method with CSLS and orthogonal constraint almost catch up those methods with parallel corpus.

Keywords/Search Tags:

Text Classification, Cross-Lingual Word Embedding, Attention Mechanism, Cross-domain Similarity Local Scaling, Procrustes Analysis

PDF Full Text Request

Related items

1	Research On Chinese-korean Cross-lingual Text Classification Method Based On Bilingual Topical Word Embedding Model
2	The Key Technologies Of Cross-lingual Aspect Sentiment Classification Towards E-commerce Reviews
3	Research On Cross-lingual Word Similarity Computation
4	Research On Machine Reading Comprehension Model Based On Cross-lingual Transfer Technology
5	Research On Cross-lingual Sentence Summarization
6	Research On Cross-lingual Word Embedding Construction Methods Based On Deep Semantics
7	Research On The Application Of Machine Translation In Cross-lingual Document Classification
8	Research On Image-Text Cross-Modal Matching Based On Attention Mechanism
9	The Research On Learning Cross-lingual Word Embeddings Based On Adversarial Training
10	Research On Cross-lingual Text Sentiment Classification Based On Deep Learning