Font Size: a A A

Quality Estimation Of Machine Translation Using Pre-training Language Model

Posted on:2020-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:Z C YangFull Text:PDF
GTID:2428330575495001Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,neural machine translation technology has made a major breakthrough and has been rapidly applied and popularized.However,there are still some problems,such as machine translation quality estimation,out-of-vocabulary words,long sentence translation,over translation and omission and so on.Machine translation quality estimation(Quality Estimation,QE)is to study how to solve the problem of evaluating the quality of machine translation without reference translation.The research results can not only help the machine translation system to filter out low-quality translation results and build a high-quality parallel corpus,but also reduce the workload of post-translation editing.Therefore,this study is of important research significance and practical value.The existing QE methods can be divided into two cateeories,one is based on machine learning,the other is based on deep learning.These tw^o methods are committed to extracting features closely related to QE tasks,and the quality of the extracted features determines the performance of the system.Recently,the pre-training language model refreshes the best results of many natural language processing tasks and shows strong representation learning ability.Therefore,this paper mainly explores how to integrate the pre-training language model into QE tasks in order to improve the performance of QE.The main work and innovations of this paper include:(1)A machine translation quality evaluation method is proposed,which combines the machine translation features extracted from the pre-training language models such as ELMO,GPT and BERT with the features extracted by the "bilingual expert"model.The features extracted by the two models can complement each other and effectively alleviate the problem of sparse features in QE tasks.The experimental results show that significant improvements have been made on the both sentence level task and word level task.(2)A sentence-level machine translation quality evaluation method based on BERT+LSTM+MLP architecture is proposed.LSTM network encodes the high-level features of source sentences and target statements extracted by multilingual BERT into fixed-size vectors and sends them into fully connected neural networks to obtain the model prediction score.The experimental results show that this method can reach the best level of QE at present.(3)A machine translation quality evaluation method based on dependent syntactic information is proposed.The dependency label of each word in the source sentence and target translation is transformed into vector representation and concated with word vector,and then sent to the model for training to make the model learn syntactic structure information.The experimental results show that the performance of QE model has been further improved.In a word,this paper creatively proposes a method of integrating pre-trained language model and dependency syntactic information into QE task,and verifies the effectiveness,advance and practicability of the proposed method through experiments.
Keywords/Search Tags:Quality Estimation, Machine Translation, Neural Network, Language Model, Machine Learning
PDF Full Text Request
Related items