Font Size: a A A

Research On Key Technology Of Post-optimization For Machine Translation

Posted on:2020-08-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:J G ZhuFull Text:PDF
GTID:1368330590972756Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years,with the development of machine translation technology,machine translation systems have been widely used in various fields to help people complete some cross-language tasks.However,in most cases,machine translations are still not comparable to manual translations.Especially when faced with some translation tasks with high translation quality requirements,on the basis of automatic translation of the machine translation system,people still need to further polish,modify and proofread the machine translation to ensure translation quality.Compared with manual translation optimization,the automatic post-translation optimization of machine translation is to further improve the translation generated by one or more translation systems,improve the translation quality and reduce the cost of manual editing.However,in the automatic post-translation optimization process,user data that can be used for post-translation optimization is usually small or even non-existent.Faced with such difficult situations of small data or zero data,how to effectively improve the quality of translation is an important issue to be solved in the field of machine translation.The research in this paper focuses on how to make full use of small data,to construct pseudo data,or to transfer the general big data(parallel corpus for machine translation)and explore how to utilize limited human translation history or different machine translation results,further improve the quality of machine translation results and provide corresponding quality assessment methods to achieve the goal of improving translation quality and reducing manual editing costs.The main research of this paper includes the following four aspects:(1)Research on translation consistency optimization based on small-scale translation examples.In the scenario of a few user translation data,focusing on how to effectively use the data to optimize the translation of the general machine translation system to better meet the translation requirements of specific fields,this paper provides consistency optimization method based on small-scale translation examples to combine manually translate historical examples and available machine translations.The method uses the confusion network model as the combination framework,and uses a log-linear model based on multiple features to generate the translations in decoding process.It solves the conflict of combining the translations of different sources in small data scenarios.(2)Research on post-edit optimization based on pseudo feedback.In the scenario of a few user translation data,focusing on how to use this data to learn to construct post-editing modeling of machine translation,thereby more effectively modifying machine translation errors and reducing the problem of repetitive labor.This paper proposes a post-editing optimization method based on pseudo feedback.The method utilizes machine translations of similar translation examples to generate pseudo-feedback of translations,effectively overcoming the sparse problem of editing data after translation.At the same time,the method can also introduce the context information of the source language into the post-editing model,thereby more accurately determining and estimating whether some post-editing phrase rules are suitable for editing the machine translation of the source sentence to be translated.(3)Research on combination optimization of multi-system translations based on deep learning.When the user can not provide any relevant data,focusing on how to optimize the quality of machine translation translation in the extreme case of no user data,this paper proposes a deep combination method of multi-system based on bilingual data learning.The method divides the combination process into two stages of encoding and decoding,and uses large-scale bilingual data suitable for machine translation system and a small amount of translation fusion task data to train the parameters in the encoding and decoding process respectively to relieve the problem that training data does not enough training the overall model parameters in combination task.At the same time,the attention vector of the source language is used to encode the translation of machine translation in the encoding stage,to compensate for the defects in the quality of machine translation,and to limit the vocabulary and decoding space of the combination make the translation quality be significantly improved.(4)Research on deep translation quality estimation based on pseudo data.In the lack of artificial translation quality annotation data,focusing on how to construct and train the deep model of translation quality estimation,this paper proposes a method of machine translation quality estimation based on pseudo data.The idea of pseudo-data is introduced to pre-train the neural network model parameters of the translation quality estimation,whereby constructing the positive and negative examples of translation based on bilingual parallel corpus,so that the bilingual data of the general domain can be directly used training quality estimation model.At the same time,by the method of automatically generating the wrong translation,the scale of the labeled data is expanded based on the bilingual data,and the model is further trained to improve the performance of the translation quality estimation model.In summary,the main contribution of this paper is to propose a series of effective machine translation post-translation optimization methods in the face of the difficulty of small-scale user data or zero data.Based on the small-scale translation examples,this paper optimizes the translation consistency,modifies the translation errors in machine translations based on the pseudo-feedback,and trains deep combination model of multisystem translations based on the large-scale bilingual parallel corpus to improve the quality of machine translation and reduce the post-editing cost.At the same time,a deep model of translation quality estimation based on pseudo data is proposed.The effective translation quality estimation method is improved by an effective method of pseudo data generation.Experimental results show that these methods have significant performance improvements over their respective baseline methods.
Keywords/Search Tags:machine translation, post-translation optimization, automatic post-editing, multi-system translation combination, translation quality estimation
PDF Full Text Request
Related items