Font Size: a A A

Data Augmentation Research Of Neural Machine Translation

Posted on:2020-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:X Y LiuFull Text:PDF
GTID:2428330578977610Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the continuous development of computer technology,MT(machine translation)methods have also went through a long research process.In recent years,the research of artificial neural networks has brought a new solution to machine translation.And MT performance has improved by leaps and bounds with the application of Seq2 Seq model.Based on large amount of bilingual parallel data,which contains sufficient knowledge for the NMT system,the training of NMT model is the process of data representation and knowledge extraction.Therefore,how to use data enhancement method to make the system model learn from the data more easily and extract knowledge more fully becomes an important research topic.In this paper,we'll study data enhancement methods from two aspects.On the one hand,the simplification of complex knowledge is conductive to system model learning.For example,the translation of complex numeral phrases in sentences are one of the main problems in machine translation.It's difficult for the system model to learn the correct translation rules of numbers and unit information between the two languages from a limited number of examples and cause semantic and syntax errors.On the contrary,people can stipulate the translation rules for the MT system and the system itself could focus on the data what they are good at.Therefore,this paper enhances the learning of simple numeral phrases by data enhancement method,simplifies the numeral phrases in sentences,and constructs an external translation module to facilitate the NMT's translation of numeral phrases.On the other hand,monolingual data is still not fully used in MT tasks.It's found that the source monolingual data and the target monolingual data are helpful to improve the coding ability and decoding tendency in training.In addition,there's also a need to apply machine translation in specific fields.The bilingual parallel data in some fields are not sufficient enough for model training,so we try to add monolingual data in the field to the training.It's required to put original text and its translation in the encoder and decoder.By means of data self-learning introduced in this paper,we can guide the output tendency of the decoder and improve the domain adaptability of the model.The experimental results show that the data enhancement method and numerical phrase optimization method in this paper have achieved good results in neural machine translation.The translation accuracy of complex numeral phrases is over97%,and the BLEU score of the domain adaptation system is two points higher than that of baseline.
Keywords/Search Tags:Neural Machine Translation, Simplification of knowledge, Data Enhancement
PDF Full Text Request
Related items