Machine Translation Quality Estimation In Low-resource Setting

Posted on:2023-06-21

Degree:Master

Type:Thesis

Country:China

Candidate:H Huang

Full Text:PDF

GTID:2558306845997639

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Machine Translation Quality Estimation(QE)aims to evaluate the quality of machine translations without reference.QE can reduce the workload of post-translation editing,and can be used as a metric for translation system during training.Previous methods deal with QE by the methods of statistics and deep learning.In recent years,after the emergence of multilingual pre-trained models,pre-trained model-based QE has become mainstream owing to their powerful feature extraction capabilities.However,the pretrained parameters are rapidly increasing,therefore model ensemble will lead to excessive inference overhead.How to integrate different features of multiple models with limited computation resource has become an urgent problem.In addition,due to the difficulty of artificial annotation,most of the QE data only contain thousands of training samples,leading to a serious data sparsity problem.In this paper,we aim to study QE methods in low-resource setting on two problems:feature integration and data sparsity.Our contributions mainly consist of three parts:(1)To cope with the excessive ensemble cost of multiple pre-trained models,we propose an iterative ensemble distillation algorithm,which integrates the knowledge of multiple pre-trained models into a single model.The performance of QE is greatly improved with no additional inference overhead;(2)To deal with the data scarcity problem,we propose to use unlabeled parallel data to train QE models by contrastive learning.We also propose to leverage the multi-source denoising autoencoder to construct negative examples for contrastive learning;(3)We propose a data augmentation framework based on pre-trained models,which leverages different understanding and generative pre-trained models to construct synthetic data with the same distribution,and the decoding space is transferred with the help of knowledge distillation.Experiments on several public datasets such as WMT(Workshop on Machine Translation),CCMT(Chinese Conference on Machine Translation)prove the effectiveness of the proposed methods.Our study taps the potential of the pre-trained model in the task of QE and improves its accuracy and practicability in resource-limited scenarios.

Keywords/Search Tags:

Translation Quality Estimation, Machine Translation, Pre-trained Model

PDF Full Text Request

Related items

1	Neural Machine Translation Based Translation Quality Estimation
2	Research On Translation Re-ranking Based On Quality Estimation
3	Research On Key Technology Of Post-optimization For Machine Translation
4	Research On Improvement Of Training Samples In Quality Estimation Model
5	Research On Machine Translation Metrics Based On Pre-trained Model
6	A Transformer-based Unified Neural Network For Quality Estimation Of Machine Translation
7	Research On Quality Estimation Based On Pre-trained Language Models
8	Research On Machine Translation Quality Estimation Methods Considering Discourse Relation Information
9	Research On Ancient Chinese Translation Method Based On Deep Learning
10	Research On Neural Machine Translation Methods Incorporating Pre-trained Language Model Knowledg