| Neural machine translation has become the predominant machine translation technology in both research and application,demonstrating outstanding performance on large-scale parallel corpora.However,when confronted with translation tasks in specific domains,the translation performance can significantly decline due to the lack of parallel data in the indomain.As parallel data in specific domains is often scarce or non-existent,domain adaptation has emerged as an important research topic.The core idea of domain adaptation is to fully utilize existing resources to help improve the performance of models in low-resource domains.To explore how to enhance the domain adaptability of neural machine translation systems,this thesis first designs corresponding domain adaptation methods for scenarios with and without parallel corpus.Furthermore,considering domain characteristics,the thesis proposes a domain adaptation method incorporating word-level domain information to address highly specialized translation scenarios.The main research of this thesis is as follows:(1)For scenarios with limited in-domain parallel data,a domain adaptation method for neural machine translation based on data augmentation is proposed.One of the current classical domain adaptation method is fine-tuning,but the parallel data in low-resource domains is small in scale,making direct fine-tuning prone to overfitting.For this reason,this thesis proposes a data augmentation method based on creating mixed-language sentence pairs,expanding the training data by randomly connecting sentences in different languages.This not only increases the number of parallel sentence pairs but also allows mixed-language data to enrich semantic information,better assisting model training.This method is not only suitable for standard domain adaptation scenarios,but can also be extended to semi-supervised domain adaptation scenarios.Experimental results in two scenarios show that this method can effectively improve the translation performance of domain adaptation models.(2)For scenarios with no in-domain parallel data,an unsupervised domain adaptation method based on improving the quality of pseudo-parallel sentence pairs is proposed.The most widely effective method in existing unsupervised approaches is to generate pseudocorpora using monolingual data,but cross-domain translations contain many errors,which affect the performance of the model.This thesis combines model and data aspects to improve the quality of pseudo-parallel sentence pairs,thereby enhancing the model’s domain adaptability.Firstly,in the model aspect,a more reasonable pre-training strategy is proposed to improve the model’s generalization ability.Then,in the data aspect,sentiment information of sentences is fused for posterior filtering to further improve the quality of pseudo-corpora.Experimental results and analyses show that this method can effectively improve translation results,confirming the effectiveness of this domain adaptation approach.(3)Further delving into the fine-grained word level,for more specialized domains,a terminology-aware neural machine translation domain adaptation method is proposed.In some specialized fields,accurate translation of terminology is crucial for the quality of translation.Moreover,the terminology often exhibit polysemy in practice,resulting in concerning translation quality.To address this issue,this thesis proposes an efficient two-stage framework that sequentially performs context-based terminology disambiguation and integrates unambiguous terms into the translation model.By building a terminology-aware translation system,domain-specific terms can be accurately expressed,improving translation quality without losing domain information.Experimental results show that this method can make full use of terminology knowledge and improve translation results in the in-domain. |