In the context of China’s "Belt and Road" strategy,China and Vietnam are very close in communication,and the demand for machine translation services has increased significantly.The realization of automatic translation between languages has become a strategic requirement for China’s further development.At present,neural machine translation has achieved good results among resource-rich languages,but it does not perform well in resource-scarce ChineseVietnamese languages.How to fully tap and use different levels of language feature knowledge to make up for the shortage of resources is the difficult problem faced by Hanyue neural machine translation.This chapter explores from different levels of language feature knowledge in the language as an entry point,and effectively introduces different levels of language feature knowledge into the neural machine translation process,thereby improving the performance of Hanyue neural machine translation.The main research content of the article is divided into the following four parts:(1)Construction of a bilingual parallel corpus for Chinese-Vietnamese neural machine translation.The construction of a corpus is an important basis for machine translation methods.Parallel corpora provide essential training data for machine translation models.To carry out resource-scarce language translation,a corpus of a certain size must be constructed.This chapter briefly describes the overall process and preprocessing work of crawling Hanyue training data.After cleaning the final crawled data,144 K Hanyue bilingual data is sorted out,and then 2000 parallel sentence pairs from the test set are randomly selected,The verification set has 2000 parallel sentence pairs,and the training set has 140 K Chinese-Vietnamese parallel sentence pairs.(2)A Chinese-Nanjing neural machine translation based on different levels of language feature knowledge is proposed.In neural machine translation tasks,segmentation of language features at different levels for corpus is a very important step in the preprocessing process.The greater the granularity of segmentation,the more complete the local features can be saved in the segmentation result,but this is aggravated The problem of data sparseness is reduced;the smaller the granularity of segmentation,the less local features are included,but the problem of data sparseness will be alleviated to a certain extent.In this chapter,we study the effects of different levels of language feature knowledge on Chinese-Nanjing neural machine translation.By analyzing the language characteristics of Vietnamese,we divide Vietnamese into four different levels of language granularity: words,syllables,characters,and sub-words.Secondly,the deep separable convolution is used to improve the neural machine translation model.By adding the deep separable convolutional neural network,the convolution operation is performed on the different granularity sequences input by the model to extract more feature data.Experimental results show that sub-word granularity has the best effect in Chinese-Nanjing neural machine translation,followed by word granularity,and character granularity has the worst effect.(3)This paper proposes a Chinese-Vietnamese neural machine translation with multi-level language feature knowledge.Neural machine translation has poor performance in the scarcely resourced Chinese-Vietnamese language.How to fully tap and use different levels of language feature knowledge to make up for the shortage of resources is the difficult problem faced by Hanyue neural machine translation.In response to this problem,this chapter proposes a ChineseVietnamese neural machine translation method that integrates multi-level language feature knowledge,which is a fusion representation of three different levels of language knowledge of characters,words and phrases.First,through the use of two-way LSTM and attention mechanism,the shallow semantic information contained in characters and words is fused,and the optional word representation is dynamically combined on the basis of character components.Secondly,by constructing a phrase tree encoder based on the standard sequence encoder,the phrase information in the sentence is further integrated into the sequence conversion process of neural machine translation.The experimental results show that the method in this chapter effectively utilizes different levels of language feature knowledge and improves the performance of Hanyue neural machine translation to a certain extent.(4)A Chinese-Chinese machine translation system integrating multi-level language feature knowledge is constructed.With the help of the above methods,in this chapter,neural machine translation has been modeled at different levels of language feature knowledge,and a ChineseChinese machine translation system that integrates multi-level language feature knowledge has been constructed.Reference value. |