Font Size: a A A

Towards Robust Neural Machine Translation With ASR Errors

Posted on:2024-07-20Degree:MasterType:Thesis
Country:ChinaCandidate:D Y HuFull Text:PDF
GTID:2568306941964099Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Machine translation is a widely concerned task in the field of natural language processing.In recent years,the end-to-end neural Machine Translation(NMT)model has achieved excellent translation performance.However,NMT models are very susceptible to input noise.For example,when translating the output of automatic speech recognition(ASR),the translation performance will drop drastically.Due to the lack of Chinese-to-English translation test sets with natural Chinese ASR output,relevant studies artificially add noise to Chinese sentences to evaluate translation performance.In order to improve the robustness of NMT model when translating noisy sentences containing ASR errors,this paper carries out research from the following three perspectives:(1)Robust neural machine translation with contrastive learning.Due to the lack of Chinese-English translation test set with natural Chinese ASR output,for the NIST translation test set containing 680 documents and 7688 sentences,this paper uses human reading and automatic speech recognition to construct the automatic ASR output of this test set,called NISTasr.In order to enhance the translation performance of the translation model on the NISTasr dataset,the contrast learning framework is used to narrow the gap between the original input representation and the perturbed corresponding representation.We firstly constructs positive and negative samples,and strengthens the anti-interference ability of the model by narrowing the distance between the real samples and the correct samples of the source sequence with the constraint model.The experimental results show that the method can effectively improve the translation performance on NISTasr.(2)Robust neural machine translation with rewriting-based data augmentation.Although the machine translation model has achieved very good performance,the translation model still suffers due to the different data distribution in the training and inference stages.To alleviate this problem,we proposes a controlled generation method to meet the challenge of domain adaptation.The generation method proposed in this paper can make better use of large-scale source domain data(formal text)and small-scale target domain data(ASR output).Specifically,by considering effective counterfactual conditions(the concatenation of source domain text and target domain label),counterfactual data is constructed to expand training data and bridge the gap between source domain data and target domain data.Experimental results show that this method can effectively enhance the anti-jamming capability of the model to NISTasr data.(3)Robust neural machine translation with document-level context.The sentence-level machine translation model takes sentences as input while ignoring document-level context.Therefore,this paper aims to improve the robustness with document-level context.It first proposes a data augmentation method to generate sentences with natural ASR-like errors.Then it propose joint learning of document-level translation and restoration,in which it uses document-level context to help translate and restore current sentence under multi-task learning framework.
Keywords/Search Tags:Machine translation, Robustness, Document-level translation, Contrastive learning, Data augmentation
PDF Full Text Request
Related items