Font Size: a A A

Research On The Method Of Paragraph-level Neural Machine Translation Integrating Paragraph Informatio

Posted on:2024-02-03Degree:MasterType:Thesis
Country:ChinaCandidate:X W ChenFull Text:PDF
GTID:2568307109487564Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Document-level neural machine translation(NMT)is the key task to translate articles in machine translation,and it is also an important methods to improve the performance of machine translation.However,the current multi-encoder document-level neural machine translation methods are all based on encoding local context information,and do not fully consider the global context information,making the extracted context information incomplete,resulting in document-level neural machine translation perform poorly.Therefore,aiming at the above problems,this thesis studies how to effectively use contextual information to guide document-level neural machine translation to generate high-quality translations and improve the performance of docuemnt-level neural machine translation.Firstly,this thesis introduces the research status,existing problems and challenges of document-level parallel sentence pair extraction and document-level neural machine translation.Secondly,based on the document-level comparable corpus from website,the document-level parallel corpus extraction method based on metric learning used to build a parallel corpus;on this basis,a document-level neural machine translation method that integrates topic information is also studied,so that the model can simultaneously pay attention to local context sentences and the topic of the document,and capture context more comprehensively.Thirdly,this thesis studied the documentlevel neural machine translation method based on article repair enables the model to repair the errors in the translation of the document translation method.Finally,the documentlevel neural machine translation prototype system is developed and the future research direction and development trend are introduced.The main contributions of this thesis are as follows:(1)A metric learning-based document-level parallel corpus extraction method is proposed,and the similarity between article pairs is calculated so as to extract similar document pairs more accurately.The specific idea is to first perform word embedding on the bilingual document sentences through the pre-trained word embedding model,and then use the weight between the sentences calculated by the document embedding.Sencondly,introduce the metric learning model to calculate the distance between the the document pairs,and then substitute the previously calculated weight and document pair measure distance into the greedy mover distance formula to calculate the distance between the document pairs.Finally,the aligned documents are extracted according to the minium distance.The experimental results show that the model improves 0.03 on the recall score,which is better than the baseline.(2)A document-level machine translation method that integrates topic information is proposed to generate higher-quality sentences.The specific idea is based on the Transformer model.While using a context encoder to encode local context sentences,a topic representation encoder is introduced to extract the topic representation of the encoded source sentences,and then the outputs of the two encoders are incorporated into the decoder.In this way,the model can capture more context information.The experimental results show that the method in this thesis has significantly improved the BLEU on the Chinese-English,English-French,and English-German datasets.Compared with the baseline,it has improved significantly and significantly improved the performance of machine translation.(3)The current multi-encoder document-level neural machine translation method only pays attention to local context information,which leads to the problem of inconsistent pronoun translation context in the document translation.In this thesis,we propose a document-level neural machine translation approach based on a document repair model.The specific idea is to use the Transformer to encode the machine translation and its context sentence together,and at the same time use an encoder to encode the source sentence and its context sentence,and finally combine the outputs of the two encoders into the decoder,mapping sentences with inconsistent translations in context to sentences with consistent ones.During training,the attention mechanism weights of the encoder and decoder are replaced with the attention mechanism weights of the pre-trained BERT model.The experimental results show that the method in this thesis has significantly improved compared with the baseline model on the Chinese-English and Chinese-Vietnamese datasets.
Keywords/Search Tags:Document Alignment, Multiple Encoder, Document Translation, Document Context, Translation Repair
PDF Full Text Request
Related items