Font Size: a A A

Research Of Optimization Methods Integration And Translation Rerank For Mongolian-chinese Machine Translation

Posted on:2018-10-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:J WuFull Text:PDF
GTID:1318330542480078Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Machine Translation(MT)has boomed in recent years and made remarkable achievements.Low-resource MT research and minority language MT research have also received more attention.Mongolian language is the official language of Inner Mongolian Autonomous Region,and also widely used by many countries and regions.On the one hand,the research of Mongolian-Chinese MT is significant to accelerate culture communication of multi-ethnic.On the other hand,it also has a positive role in promoting the research of other low-resource language MT tasks.However,Mongolian-Chinese MT research is faced with many obstacles:huge language span,complicated Mongolian language morphology,low-resource of parallel corpus and tools,weak research foundations and so on.The template-based machine translation(TBMT)model,statistical machine translation(SMT)model as well as the cutting-edge neural network machine translation(NMT)model represent the development process of MT.This thesis systematically optimizes the three models in the Mongolian-Chinese translation task by multiple algorithms.In order to make full use of the limited resources and tools of Mongolian-Chinese machine translation,this thesis prompted a reorder algorithm to combine the translation results of the three systems in sentence level to further enhance the quality of Mongolian and Chinese translation.1.To address the data sparsity and Mongolian word recognition difficulties caused by Mongolian complex morphology,the thesis put forward multiple strategies of Mongolian morphology analysis and carried out a lot of contrast experiments to verify these strategies in the three translation models.Through the comparative analysis,the Mongolian morphological analysis strategy applied to different models are:in the SMT model,the Mongolian stems are used as the translation granularity;in the NMT model,both the Mongolian stems and the suffixes are used as subwords in the model training;in TBMT,we apply a multi-suffix morphology analysis to carry out fuzzy matching.2.We promoted a novel realignment algorithm to optimize the SMT model.The framework uses finer granularity of Mongolian and Chinese for alignment and coarser granularity for translation rule induction.Firstly,Mongolian stems and Chinese characters are taken as finer granularity for alignment to alleviate the data sparsity.Afterwards,we realign it to coarser granularity alignment of Mongolian and Chinese words.Then we use the realignment result to extract translation rules and decode.The framework improves the alignment quality and translation performance at the same time.3.We built an attention-based bidirectional recurrent neural network translation model for Mongolian-Chinese MT research.We propose a phrase and character joint-training method to enrich multiple granularity features for the NMT decoder as well as reducing the Out-of-Vocabulary words.This approach is particularly effective for the low-resource translation task by expanding multiple granularity representations for the training data and the vocabulary.4.We built a Mongolian-Chinese TBMT system including a template extraction model and a template translation model.We proposed a novel method of aligning and abstracting static words from bilingual parallel examples to extract templates automatically.We also proposed a method to filter out low quality TBMT translations to enhance the combined system.Moreover,we applied a multi-suffix morphology analysis method to do fuzzy match.5.To make full use of the translation results,we proposed a translation rerank model to combine the above translation systems.The rerank model uses two recurrent neural network to encode the source the target sentences and measure the cross-language similarity.The thesis studied multiple Mongolian-Chinese machine translation and optimize them in different ways.Those optimizing algorithms overcome the obstacles of Mongolian-Chinese machine translation like Mongolian complex morphology,low-resource,data sparsity and feature loss.The work of this thesis put forward innovative algorithms and optimization methods for Mongolian-Chinese machine translation research,which improve the performance of Mongolian and Chinese machine translation and make contribution to achieve the new height of Mongolian and Chinese machine translation.
Keywords/Search Tags:Mongolian-Chinese Machine Translation, Neural Network Machine Translation, Statistical Machine Translation, Template-based Machine Translation, Translation Rerank
PDF Full Text Request
Related items