Font Size: a A A

Design And Implementation Of English-Chinese-Uyghur Multi Engine Machine Translation System

Posted on:2021-04-16Degree:MasterType:Thesis
Country:ChinaCandidate:W Q LiuFull Text:PDF
GTID:2518306128476674Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,the international and ethnic exchanges are increasing.In the face of huge multilingual Internet information,we can solve the communication obstacles through high-performance machine translation system.In this paper,the English-Chinese and Uyghur-Chinese machine translation models are implemented by two methods,and monolingual data is introduced to improve the translation quality of the models.Finally,a multi engine online machine translation system is built.At present,most of the bilingual parallel corpora used to train the neural machine translation model come from the Internet.However,these corpora are mixed by data from multiple fields,lacking clear domain boundaries and difficult to be divided according to the domain.When training the machine translation system,some domain features will be lost.In this paper,K-means text clustering method is used to flatten the existing bilingualism Row corpus can be divided into several categories,each of which has a high degree of data similarity.It uses various kinds of data to fine tune multiple models,uses decoding strategies under multiple translation models,selects the most similar translation model to decode,and improves the task of Uygur-Chinese translation.Another method is to solve the problem of data sparsity caused by the lack of Uyghur-Chinese parallel corpus and the complexity of Uyghur morphology.The method based on syllable + BME(Begin,B;Middle,M;End,E)marker is used to improve two BLEU(Bilingual Evaluation Understudy)on the transformer model which introduces the evaluation module.At the same time,the above two methods are used in English-Chinese translation,and a highquality translation model is obtained.In order to further improve the translation quality of English-Chinese and UyghurChinese models,and fully mine the information of monolingual data,this paper makes reverse translation of Chinese monolingual data and then filters and selects them to form pseudo parallel corpus,and improves the translation quality of English Chinese models by this method.And through iterative reverse translation,the translation quality of the Uygur Chinese model is further improved by 1.53 BLEU,and the performance of the Chinese-Uygur translation model is also improved.When building an online machine translation platform,the way of Nginx + uWSGI+ Django is generally used to combine multiple servers.If multiple servers with different performance are used,Nginx can only poll each server to distribute tasks,but can't reasonably allocate tasks according to the processing capacity of each server,resulting in the lowest performance GPU(Graphics Processing Unit)of the whole system When the system is faced with high concurrent requests,it will lead to system overload.To solve these problems,this paper selects English-Chinese and Uyghur-Chinese machine translation models to test online translation performance on different GPUs,and then adds task scheduling service interface on the Nginx + uWSGI + Django framework according to the test results.The system can select appropriate GPU server for translation according to the size of the request,and split the requested long text.The design of this paper effectively improves the stability of multi engine system.
Keywords/Search Tags:neural network, machine translation, text clustering, syllable, multi engine
PDF Full Text Request
Related items