Font Size: a A A

Research On Chinese-to-english Machine Translation Based On Neural Network

Posted on:2021-03-11Degree:MasterType:Thesis
Country:ChinaCandidate:J P LiuFull Text:PDF
GTID:2428330626960362Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the advantages of higher speed and lower cost,machine translation is considered as a promising way to overcome the barrier of communication among different languages.In recent years,with the development of deep learning,neural machine translation based on “encoder-decoder” structure has become the main research method of machine translation.However,due to the limitation of vocabulary size and the imperfections of coverage mechanism,there are problems such as out-of-vocabulary(OOV),over-translation and under-translation in neural machine translation.To address the OOV problem,we propose a data generalization method based on the “substitution-translation-restore” framework.Firstly,we determine the types of OOV words to be processed in the corpus and design algorithms to recognize and align the bilingual OOV words.Secondly,the OOV words in both training set and test sets are replaced with specific generalization symbols,and the generalized corpus is then used for model training and translation prediction.Thirdly,the OOV words are translated by methods based on dictionary or rules.Lastly,the generalization symbols in the translation produced by neural machine translation model are restored with the translation of OOV words for final translation.Experimental results show that the data generalization method can significantly enhance both the performance of neural machine translation systems and the translation accuracy of OOV words.Compared with the RNNSearch and Transformer baseline systems,the BLEU scores are increased by 4.72% and 4.21% respectively.Further experiment on Transformer system shows the translation accuracy of OOV words is increased by 35.16% on average.In order to further alleviate the over-translation and under-translation problem,we propose a multi-coverage fusion mechanism based on the consistency and complementarity of information stored in different coverage models.The translation information stored in both coverage vector and coverage score is used simultaneously to guide the attention mechanism.We first define a word-level coverage score and propose two fusion methods.Experimental results show that our multi-coverage fusion model can enhance the performance of neural machine translation,and further improve the alignment quality and alleviate over-translation and under-translation compared with other coverage models.
Keywords/Search Tags:neural machine translation, data generalization, over-translation, under-translation, multi-coverage
PDF Full Text Request
Related items