Font Size: a A A

Research On Chinese-Myanmar Neural Machine Translation Method Integrating Bilingual Dictionary

Posted on:2021-09-19Degree:MasterType:Thesis
Country:ChinaCandidate:X WuFull Text:PDF
GTID:2518306200953339Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
Neural machine translation(NMT)has become the mainstream direction of current machine translation,and it has achieved extremely high performance in language pairs with rich resource,but it is not yet mature to use low-resource language pairs such as Chinese-Burmese.In Chinese-Burmese neural machine translation,there will be more words in the corpus that are not covered by the vocabulary.We call these words out of vocabulary.We find that the introduction of external knowledge is useful to solve the problem of out of vocabulary in machine translation of low-resource languages.For the integrationof bilingual dictionaries in Chinese-Burmese neural machine translation,The main research work of this paper is as follows:(1)Chinese-Burmese Parallel Sentence Pair Extraction Method Based on CNNCorr Net NetworkConstructing bilingual parallel corpus is an effective method to improve the quality of machine translation for low-resource languages and provides a data basis for training neural machine translation models.We propose a Chinese-Burmese parallel sentence pair extraction method based on CNN-Corr Net.Specifically,we first use BERT to obtain the word vectors of Chinese and Burmese,and then use convolution neural network to represent the sentences of Chinese and Burmese to capture the important feature information of the sentences.Then,in order to ensure the maximum correlation between the cross-language representations of the two languages,the existing Chinese and Burmese parallel sentence pairs are used as constraints,and Corr Net(Correlational Neural Networks)is used to project the Chinese and Burmese sentence representations into the common semantic space.Finally,the distance of Chinese and Burmese sentences in the public semantic space is calculated to judge whether the Chi-nese-Burmese bilingual sentences are parallel sentences or not.The experiment results show that compared with the maximum entropy model and the siamese network model,the F1 value of the method proposed in this paper is increased by 13.3% or 5.1% respectively.(2)The construction method of Chinese-Burmese bilingual dictionary based on iterative self-learningBilingual dictionaries are an important knowledge feature to solve the problem of out of vocabulary,but the construction of existing bilingual dictionaries mostly depends on a large number of parallel corpora.Burmese language is a low-resource language,and there are fewer bilingual parallel corpus resources.In order to reduce the constraints of bilingual parallel corpus,this paper proposes a simple iterative self-learning method that uses the structural similarity of the bilingual word embedding space.On the basis of learning to construct a bilingual dictionary,the context features of the candidate set are used as constraints to achieve a small-scale bilingual seed dictionary to extract a large-scale bilingual dictionary from comparable corpus,and achieved good results.Specifically,based on the comparable Chinese-Burmese bilingual corpus,this paper uses a small-scale seed dictionary to learn the bilingual cross-language mapping relationship through iterative self-learning method to obtain the Chinese-Burmese bilingual candidate sets,and uses the context features of these sets as constraints,Extracted to a higher quality bilingual dictionary.The method in this paper can effectively extract Chinese-Burmese bilingual dictionaries from bilingual comparable documents on the basis of small-scale bilingual dictionaries.(3)Chinese-Burmese Neural Machine Translation Method Integrating Bilingual DictionariesAt present,low-resource language neural machine translation faces the problem of out of vocabulary.Out of vocabulary affect the quality of translation.The introduction of external knowledge is very helpful to solve the problem of out of vocabulary in low-resource machine translation.Therefore,this paper proposes a Chinese-Burmese neural machine translation method integrating bilingual dictionaries.Based on the traditional neural machine translation model based on the attention mechanism to learn the soft alignment relationship of bilingual words,and then the existing bilingual dictionary knowledge is represented by logarithmic linear model to obtain statistical word alignment information.Finally,in the training phase of the model,the soft alignment relation of bilingual words in the machine translation model itself and the statistical alignment information represented by prior bilingual dictionaries are constrained to maintain the same distribution,thus achieving the purpose of bilingual dictionary fusion based on posterior regularization.The experimental results show that the method in this paper can effectively integrate bilingual dictionaries into ChineseBurmese neural machine translation,thereby solving the problem of out of vocabulary.(4)Chinese-Burma Neural Machine Translation Prototype SystemBased on the above-mentioned theoretical research,a prototype system of Chinese-Burmese neural machine translation is constructed.The modules of the system include sentence input/output module,out of vocabulary processing module,neural machine translation module,etc.e...
Keywords/Search Tags:Chinese-Burmese bilingual, Parallel sentence pair extraction, Out of vocabulary, Neural machine translation
PDF Full Text Request
Related items