Font Size: a A A

Research And Implement On Low-Resource Neural Machine Translation Based On Zero-Shot Learning

Posted on:2022-03-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y Z LiuFull Text:PDF
GTID:2518306542455614Subject:Engineering
Abstract/Summary:PDF Full Text Request
NMT is a recently formulated method for automatic translation with the help of deep neural networks.The current research in machine translation demands huge parallel corpus because NMT systems are based on probabilistic models generated using the features extracted from the parallel corpus.To derive better features,sentences in the corpus should be of good quality.Therefore,parallel sentences should convey the same meaning.Nowadays,there are bilingual corpora of good quality and moderate scale among the relatively common high-resource languages in the world.However,corpus resources in some countries and regions bilingual corpora of good quality and moderate scale.In this paper,we focus on such issue.And we start from the NMT training process.With the help of the training process of NMT,we modify the loss function of the whole training procedure.We propose two pivot-based zero-shot machine translation methods that can be effectively applied to Azerbaijan to Chinese to build machine translation systems.It provides reference for the research of low-resource neural machine translation.Our main contribution as follows:(1)We propose a word-level training procedure of pivot-based NMT.A machine translation system is pre-trained between English-Chinese language pairs,and the probability output of the model is combined with the original maximum likelihood estimation.Through the practical application of knowledge distillation in NMT,a zeroshot machine translation system from Azerbaijani to Chinese is constructed under the condition of only bilingual corpus from Azerbaijani to English.Our experiments show that contrary to pivot-based method our approach earns a better performance and it can provide propel translation.(2)We propose a sequence-level training procedure of pivot-based NMT.NMT systems are trained directly to minimize word negative log likelihood loss at each position and it is impossible to measure and evaluate the connection between the whole sequence and sequence.Therefore,the overall estimate of the sequence distribution helps the pretrained NMT model to convey a wider range of knowledge.Sequence distribution is much worthwhile for NMT.Aiming at the defect of exponential time complexity in evaluating the overall distribution of the input sequence in pivot-based NMT,we propose a sampling algorithm that reduces the sample space to estimate the distribution of the target sequence in a more direct way.Experimental results show that the effect of the sequence-level zeroshot neural translation method can greatly improve the level of zero-resource machine translation.(3)We use our proposed sequence-level training method which combing with Nginx open source HTTP server,a machine translation system from Azerbaijan to Chinese is built and it can provide reference from Azerbaijan to Chinese.
Keywords/Search Tags:NMT, low-resource, zero-shot learning, Transformer
PDF Full Text Request
Related items