Font Size: a A A

Research And Implementation On Uyghur-Chinese Neural Machine Translation

Posted on:2020-01-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y YangFull Text:PDF
GTID:2428330596975114Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
China is composed of 56 ethnic groups,and there are a lot of differences about culture among various ethnic groups,as well as the development of economy.In order to realize the "Chinese dream" of the great rejuvenation of the Chinese nations,the economic and cultural exchanges among ethnic groups are indispensable.However,the differences in language among nationalities have become the primary obstacle.Language translation is the main way to overcome this obstacle.Machine translation has become an important method to overcome this obstacle because professional human translation is difficult to meet people's translation needs.As the national language of the Uyghur people,Uyghur has the characteristics of less parallel corpus and complicated grammatical structure as many other minority languages.It can be treated as an entry point of the study for machine translation between minority languages and Chinese,and provides more ideas and methods for the translation of minority languages in the future.Currently,most of the research on Uyghur-Chinese translation is based on traditional phrase-based statistical machine translation model and trained with small-scale parallel corpus.Most of these researches,which have achieved certain results,followed traditional statistical machine learning ideas and methods.However,the research on the Uyghur-Chinese neural machine translation is still in exploratory stage.As a result,different ideas and methods from statistic machine translation,and neural network-based machine translation technology route are adopted in this research,and we have completed the following tasks in view of the difficulties of Uyghur-Chinese translation in this thesis:(1)The theories and technologies of neural machine translation have been summarized.In this thesis,we first expounded the related concepts of neural machine translation,then introduced various models and technical means,and then described in detail the methods and tools of machine translation automatic evaluation.(2)A method for language modeling based on Self-Attention and a translation model based on two encoders and two decoders have been proposed.In order to solve the problem of lacking parallel corpus,a language model based on Self-Attention has been proposed to make full use of monolingual corpus to improve translation results.In addition,in order to fully extract the features of the two languages while translating,a translation model based on two encoders and two decoders with a training algorithm combined with the back translation process have been proposed in this thesis.Finally,we completed the task of Uyghur-Chinese neural machine translation.(3)The results of several models on the Uyghur-Chinese translation task have been compared and analyzed.Under the premise of small-scale Uyghur-Chinese parallel corpus and Uyghur and Chinese monolingual corpus,several experiments with different models and learning ways have been done in comparison.In detail,machine translation based on phrase SMT,neural machine translation with supervised learning,neural machine translation with unsupervised learning way and neural machine translation with semi-supervised learning way are compared.Finally,based on the above innovations and other related researches,We realized a Uyghur-Chinese neural machine translation system based on B/S architecture.The experimental results show that the model and algorithm proposed in this thesis can effectively improve the results of Uyghur-Chines machine translation.
Keywords/Search Tags:neural machine translation, monolingual corpus, parallel corpus, language models, translation models, back translation process
PDF Full Text Request
Related items