Font Size: a A A

A Study On Unknown Words Processing In Mongolian-Chinese Neural Machine Translation

Posted on:2020-01-09Degree:MasterType:Thesis
Country:ChinaCandidate:S G W HaFull Text:PDF
GTID:2428330596471427Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Neural machine translation is a new machine translation model based on encoder-decoder.Its performance in translation tasks is excellent,so neural machine translation has become a hotspot of current machine translation research.In the process of the neural machine translation,in order to reduce computation time and memory consumption,usually,the size of the vocabulary is limited.That is,the words not in the vocabulary are represented as a unified symbol to participate in the training of the neural network translation model.That will causes some mistakes that some sentences to lose their full meaning,because of the absence of a word,which means,ultimately,those will affect the quality of the translation results.These words represented by a unified symbol are called unknown words.This paper mainly studies the problem of unknown words of Mongolian-Chinese neural machine translation systems.(1)Based on the attention-based Mongolian-Chinese neural machine translation system,the unknown words processing strategy based on semantic similarity,the unknown words processing strategy combined with the language model,and the unknown words processing strategy based on the Mongolian-Chinese alignment dictionary are adopted for the unknown words,and the unknown word processing experiment and the extended corpus experiment were performed.Among them,the unknown words processing strategy based on the Mongolian-Chinese alignment dictionary performs the best,and the BLEU and NIST values are 0.6332 and 9.1562 respectively.(2)Based on the Tensorflow platform,a transformer-based Mongolian-Chinese neural machine translation system was built,and the morpheme-based translation experiments are studied.Experiments show that the experimental results of partial segmentation of Mongolian and word segmentation Chinese are the best.The BLEU and NIST values are 0.6841 and 9.5922 respectively.(3)On the basis of the transformer-based Mongolian-Chinese neural machine translation system,the unknown words are processed in the same way,that is,the unknown words replacement strategy based on semantic similarity,the unknown words replacement strategy combined with the language model and the unknown words replacement strategy based on the Mongolian-Chinese alignment dictionary.And the unknown word processing experiment and the extended corpus experiment were performed.Among them,the unknown words processing strategy based on the semantic similarity performs the best,and the BLEU and NIST values are 0.7429 and 10.2044 respectively.(4)Compared with the attention-based Mongolian-Chinese neural machine translation system and the transformer-based Mongolian-Chinese neural machine translation system,the latter has better translation performance than the former.After processing the unknown words,the optimal model is a transformer-based Mongolian-Chinese neural machine translation model based on the semantic similarity-based unknown word processing method.
Keywords/Search Tags:Unknown words, Mongolian-Chinese neural machine translation, Attention mechanism, Semantic similarity
PDF Full Text Request
Related items