Font Size: a A A

A Neural Network-Based Optimization Method For Chinese-Malay Machine Translation

Posted on:2024-06-25Degree:MasterType:Thesis
Country:ChinaCandidate:S Q ZhanFull Text:PDF
GTID:2568307124984579Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Neural machine translation has become a mainstream approach,but it relies on a large amount of parallel corpus data for training,which is a challenge for low-resource languages.The performance of neural machine translation in a typical low-resource language pair,Chinese-Malay,is not satisfactory.In order to improve the translation effect of Chinese-Malay low resource machine translation,this paper investigates the optimization method of Chinese-Malay machine translation mainly from the aspects of corpus acquisition,lightweight model design,migration learning and pre-training model.The details of the research are as follows:(1)Aiming at the shortage of Chinese-Malay corpus,a distributed crawler program was developed to collect low-resource Chinese-Malay parallel corpus,forming 180,000 pairs of corpus data,and preprocessing,data cleaning and vectorization processing,etc.Provides corpus data for subsequent machine translation research.(2)Aiming at the problems of too many parameters in Transformer model and slow decoding time,a TEAT model based on Transformer two-terminal attention optimization is proposed,which optimizes the attention of the encoding side and the decoding side respectively.In addition,in order to further improve the translation quality of the TEAT model,a two-end alignment transfer learning optimization method based on the TEAT model is proposed,which adopts a method based on the alignment vocabulary and multi-round alignment transfer.The training parameters of the Chinese-English translation model and the English-Malay translation model with high resource language pairs are transferred to the Chinese-Malay translation model.The experimental results show that TEAT model can effectively alleviate the problems of excessive parameters and slow decoding speed of Transformer model while ensuring translation quality.Moreover,the transfer learning method is adopted,and the BLEU score of TEAT model is increased by 4.56 compared with the baseline Transformer model.(3)Aiming at the poor performance of the Transformer model in low-resource machine translation and the high cost of transfer learning,an EXDT hybrid model is proposed.The model uses XLNet pre-trained model to reconstruct the Transformer encoder to improve the input sequence modeling ability.At the same time,the AISL algorithm is proposed,the optimal input sentence length is set adaptively,and the optimization method of "progressive thawing" is proposed to unfreeze the parameters in the EXDT network step by step and release the performance.The experimental results show that compared with the baseline Transformer model,the BLEU score is improved by 6.39.(4)Based on the above research,a Chinese-Malay low resource neural machine translation prototype system is designed and implemented.The system is built on the basis of Vue and Flask frameworks.The optimization model(load optimal model parameters)proposed in this paper is the core translation function of the translation processing module.Finally,the comparison of the translation effects of each model is demonstrated,and the feasibility of the optimized model in this paper is proved.
Keywords/Search Tags:natural language processing, machine translation, transformer, xlnet, chinese-malay
PDF Full Text Request
Related items