Font Size: a A A

A Research On Constrained Machine Translation

Posted on:2022-06-13Degree:MasterType:Thesis
Country:ChinaCandidate:X Y MoFull Text:PDF
GTID:2518306725981289Subject:Computer technology
Abstract/Summary:PDF Full Text Request
There are various countries and regions in the world,and there are various languages between them.With the increasing development of economic and cultural globalization,the demand for cross-language communication between people is increasing.People urgently need a machine that can automatically convert various other languages into their known languages.Machine translation was born to solve this demand,and it has become one of the current hot issues in the field of natural language processing.In recent years,Neural Machine Translation(NMT)has achieved remarkable development and has become the mainstream machine translation method.However,the current machine translation system still has certain limitations: Unlike the previous translation system which is guided by additional language information,neural machine translation automatically extracts the features required for translation through a neural network,which brings a huge degree of freedom to the model.As a result,users cannot predict what words or syntax structures will be generated;in order to train the model more efficiently,the size of the vocabulary is often set to a fixed size,so it is inevitable that there will be unknown words,resulting in unknown words' translation failures,and affect the translation performance;data distribution often has a long tail effect,rare words cannot be fully trained,resulting in poor performance when generating sentences including rare words;neural machine translation input is a single sentence,which may lack context information,resulting in errors in the translation of polysemous words,and when translating sentence by sentence for an article,the result of polysemous word translation may be inconsistent.In addition,the neural network encodes all the information in the source sentence into a high-dimensional vector,which may result in the loss of some syntactic information,resulting in a deviation in the structure of the translation result.Therefore,it is necessary to impose word and syntax constraints on the machine translation model.In this thesis,we mainly focus on two aspects: word constraint and syntax constraint:This thesis proposes a word-constrained translation model based on extra attention,using extra attention as a copy pointer of the model,and using extra tags to provide constraint information for the model.Compared with the previous method of extracting pointers from multi-head attention,this method is more stable and helps to improve the copy performance of the model.In addition,the model can make better use of constraint information to improve translation performance under unconstrained conditions.In addition,the model can make better use of constraint information to improve translation performance under unconstrained conditions.In order to better demonstrate the translation effect of the word-constrained translation system,based on the word-constrained model in this thesis,a Chinese-English word-constrained translation model is constructed.The system is implemented based on a Web interface,which can facilitate human-computer interaction and intuitively observe the impact of word constraints on translation.In this thesis,the syntactic sequence is obtained by sampling the syntactic tree,the data set is expanded by data enhancement,and the syntactic sequence is used as a template to generate translations to constrain the syntactic structure generated by the model.Experiments show that the introduction of syntactic constraints helps to improve the translation performance of the model.In addition,using word constraints at the same time can also slightly improve the copy performance of word constraints.
Keywords/Search Tags:Neural Machine Translation, Constrained Generation, Syntactic Constrain, Words/phrases Constrain
PDF Full Text Request
Related items