Font Size: a A A

Research On The Key Techniques For Neural Machine Translation

Posted on:2021-04-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y N ChenFull Text:PDF
GTID:2518306017472904Subject:Intelligent Science and Technology
Abstract/Summary:PDF Full Text Request
Machine Translation(MT)uses computer algorithms to realize the automatic conversion in different natural languages,which has high theoretical and applied research value.In recent years,with the development of deep learning technology,Neural Machine Translation(NMT)has made breakthrough,completely outperforming the traditional Statistical Machine Translation(SMT)system in effect.Although the translation performance of NMT has been greatly improved in all aspects,the translation of long sentences is still unsatisfactory.This is due to the complex structure of long sentences and the limited memory capacity of current translation models.In addition,the lack of punctuation marks further restricts the translation effect of long sentences in practical tasks such as picture translation and speech translation.The thesis mainly focuses on these two aspects,aiming at improving the translation quality of long sentences by improving the memory capacity of the model and solving the problem of missing punctuation marks in speech translation.The main work and innovation of the thesis are as follows:1.To improve the memory capacity of the translation model,a Neural Machine Translation model based on Recurrent Expert Unit(REU)is proposed.REU enhances the parameter capacity of Recurrent Neural Networks(RNNs)with multiple expert units,and is equipped with a context-aware gating function to balance the information flow from different experts.Finally,the top-k gating function is introduced to reduce the computational complexity.The results of machine translation experiments on WMT17 Chinese to English and WMT14 English to German datasets have shown that the proposed method can significantly improve the translation performance of NMT,especially for long sentences.2.To solve the problem of missing punctuation marks,a punctuation recovery method based on Bidirectional Encoder Representation from Transformers(BERT)and Focal Loss is proposed.The method uses BERT to extract strong semantic features,and utilizes Focal Loss as the loss function in the model training process to alleviate the class imbalance problem of more unpunctuated samples than punctuated samples.The effectiveness of the punctuation recovery model has been verified by speech translation tasks in the thesis.The results of speech translation experiments on the self-built Chinese-English and IWSLT15 English-German datasets have shown that the punctuation recovery method based on BERT and Focal Loss can significantly improve the translation quality,especially the long sentences.
Keywords/Search Tags:Neural Machine Translation, Speech Translation, Memory Capacity, Punctuation Recovery
PDF Full Text Request
Related items