Font Size: a A A

Research On Unknown Words Processing Method In Neural Machine Translation Using Semantic Concept

Posted on:2019-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:S T LiFull Text:PDF
GTID:2348330542987614Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Machine Translation is the transformation from source language to target language with computer technology.It is one of the most challenging and comprehensive frontiers in the field of Natural Language Processing which has very important research and application value.Neural Machine Translation is a recently proposed approach to Machine Translation.Unlike Statistical Machine Translation,Neural Machine Translation aims at building a neural network that can be jointly tuned to maximize the translation performance,implementing an end-to-end neural translation model.At present,Neural Machine Translation has occupied the dominant position of Machine Translation,but there are still many problems in it.One of the most important problems is the unknown words problem which is caused by the limited vocabulary scale.Therefore,how to effectively process the unknown words to improve translation performance has become a hard and hot spot in present research.Unknown words not only affect the semantic integrity of the source sentences but also adversely affect the generating of the target sentences.The conventional methods usually replace the unknown words according to the similarity of word vectors,these approaches are difficult to deal with rare words and polysemous words,and are difficult to adapt to raw corpus.On the other hand,how to integrate external knowledge such as semantic dictionaries into Neural Machine Translation to improve the accuracy of translation has also become a challenging research task.To solve these problems,this paper focuses on integrating external knowledge into Neural Machine Translation,novelly applying the semantic concept to processing unknown words.The main points and contributions are shown below:1.Integrating external knowledge into Neural Machine Translation to solve the unknown words problem.The using of external semantic dictionaries improves the accuracy of replacement.It can not only generate better translations of unknown words,but also improve the quality of the whole translation.2.Presenting a monolingual semantic concept based unknown word processing method.Replacing unknown words with concepts in WordNet and a monolingual language model in the testing phase,which can improve translation performance.Experiments show the improvement of translation quality.3.Proposing a bilingual semantic concept based unknown word processing method.Replacing unknown word pairs in the train corpus with concepts in HowNet and bilingual language models in the training phase,to obtain a Neural Machine Translation model with higher quality parameters.While in the testing phase,replacing unknown words with concepts in HowNet and a monolingual language model to improve translation performance.Experiments show the improvement of translation quality.This paper successfully processing unknown words in Neural Machine Translation by integrating semantic concepts in external semantic dictionaries.Experiments on English to Chinese translation show that our method not only achieves a significant improvement over the baseline Neural Machine Translation system,but also provides some advantages compared with conventional unknown words processing methods.
Keywords/Search Tags:Neural Machine Translation, Semantic Concept, Unknown Words
PDF Full Text Request
Related items