Font Size: a A A

Research On Neural Machine Translation Technology Integrating Domain Knowledge

Posted on:2021-06-16Degree:MasterType:Thesis
Country:ChinaCandidate:D X HanFull Text:PDF
GTID:2518306329984239Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
With the support of large-scale corpus,neural machine translation has reached a very high level of translation quality.As a hot research topic,machine translation has evolved from rule-based machine translation to statistical machine translation,and has achieved remarkable results.With the rise of big data,deep learning has also emerged,bringing new technologies and methods to machine translation.Machine translation based on deep learning uses a nonlinear network model to convert between sequences,but only relying on neural networks for conversion between natural languages has two shortcomings:(1)Neural machine translation only performs well with the support of large-scale corpus,and the translation performance is not satisfactory in the case of insufficient corpus in a specific domain.(2)Neural machine translation does not explicitly learn the knowledge in the domain when modeling for certain specific domains.Aiming at problem 1,this paper uses the idea of a translation template to identify and extract nouns or terms in the sentences from the perspective of data enhancement,and retain the main frame of the sentence.Then,pseudo-parallel corpus is generated by reorganizing the extracted term set on the main sentence frame.Finally,by calculating the perplexity of the sentence,the quality of the generated pseudo-corpus is ensured,and a better quality pseudo-corpus is generated.This method effectively alleviates the problem of insufficient generalization ability of the neural network due to insufficient corpus.Experimental results show that the BLEU value of the translation obtained by this method is improved by 2.32 compared with the baseline system.In response to question 2,this paper deals with the translation problems of terms,phrases,and framework knowledge caused by the industry attributes in the domain of utility model patents.First,the domain knowledge is used as an embedded annotation to combine the framework knowledge,phrase knowledge and terminology separately into the translation model.Then all the domain knowledge is integrated into the translation model,which further improves the effect of the translation model.Experimental results show that compared with the baseline model,the BLEU values of the Chinese-English and English-Chinese models have increased by 1.28 and 2.08 points respectively.At last,the two methods proposed in this paper are integrated to design and implement the English-Chinese neural machine translation system.And compared with the baseline model,translation quality has been improved.The experimental results show that the two methods proposed in this paper can effectively improve the translation quality of neural machine translation.
Keywords/Search Tags:Neural Machine Translation, Deep Learning, Data Enhancement, Knowledge Fusion
PDF Full Text Request
Related items