Machine Translation Quality Estimation(QE)aims to evaluate the quality of machine translations without reference.QE can reduce the workload of post-translation editing,and can be used as a metric for translation system during training.Previous methods deal with QE by the methods of statistics and deep learning.In recent years,after the emergence of multilingual pre-trained models,pre-trained model-based QE has become mainstream owing to their powerful feature extraction capabilities.However,the pretrained parameters are rapidly increasing,therefore model ensemble will lead to excessive inference overhead.How to integrate different features of multiple models with limited computation resource has become an urgent problem.In addition,due to the difficulty of artificial annotation,most of the QE data only contain thousands of training samples,leading to a serious data sparsity problem.In this paper,we aim to study QE methods in low-resource setting on two problems:feature integration and data sparsity.Our contributions mainly consist of three parts:(1)To cope with the excessive ensemble cost of multiple pre-trained models,we propose an iterative ensemble distillation algorithm,which integrates the knowledge of multiple pre-trained models into a single model.The performance of QE is greatly improved with no additional inference overhead;(2)To deal with the data scarcity problem,we propose to use unlabeled parallel data to train QE models by contrastive learning.We also propose to leverage the multi-source denoising autoencoder to construct negative examples for contrastive learning;(3)We propose a data augmentation framework based on pre-trained models,which leverages different understanding and generative pre-trained models to construct synthetic data with the same distribution,and the decoding space is transferred with the help of knowledge distillation.Experiments on several public datasets such as WMT(Workshop on Machine Translation),CCMT(Chinese Conference on Machine Translation)prove the effectiveness of the proposed methods.Our study taps the potential of the pre-trained model in the task of QE and improves its accuracy and practicability in resource-limited scenarios. |