Font Size: a A A

Research And Application Of Uyghur-chinese Machine Translation Model Based On Deep Learning

Posted on:2022-04-27Degree:MasterType:Thesis
Country:ChinaCandidate:L Y WenFull Text:PDF
GTID:2518306476490914Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Traditional Uyghur-Chinese machine translation mainly uses Uyghur-Chinese parallel corpus,based on the process of word alignment and phrase alignment,to complete the training of bilingual dictionaries and language models,and realize the final translation.Uyghur is a minority language,and it is also an adhesive language with rich morpheme changes.In reality,a strict Uyghur-Chinese parallel corpus is relatively scarce.Aiming at the problem that it is difficult to obtain a large number of Uyghur-Chinese parallel corpora and the existing Uyghur-Chinese machine translation model does not make full use of the commonality between Uyghur and Chinese languages,this thesis mainly conducts two parts of research work.First,this thesis uses the three dimensions of time,space,and topic to crawl from the Internet to a relatively large number of relevant Uyghur-Chinese materials.By organizing and arranging from multiple dimensions and constructing a weak parallel corpus of Uyghur-Chinese,they are used as the research basis of this thesis.The existing neural network technology trains the Uyghur-Chinese machine translation model based on weakly parallel corpus to complete the Uyghur-Chinese translation task;Secondly,this thesis introduces the idea of local weight sharing to improve the encoder module of the translation model,and shares the parameters of the first five sublayers of the encoder module,which makes better use of the commonality of grammar and semantics between Uyghur and Chinese languages.Based on the Uyghur-Chinese weak parallel corpus constructed in this thesis,the corresponding translation model is trained through experiments and empirical analysis is carried out to verify the effectiveness of the proposed model for Uyghur-Chinese translation.At the same time,the BLEU value is used as the evaluation standard,and further experimental analysis is carried out.The results show that the Uyghur-Chinese machine translation model based on weak parallel corpus can improve the performance of Uyghur-Chinese machine translation,and the translation results in Chinese-Uyghur and Uyghur-Chinese directions are improved by 1.98 and 2.13 BLEU values respectively,which proves that the construction of Uyghur-Chinese weak parallel corpus is effective;The improved Uyghur-Chinese machine translation model with local weight sharing achieves the maximum translation results of 2.51 and 2.52 blue values in Chinese-Uyghur and Uyghur-Chinese directions,and the PPL value is also greatly reduced.The language fidelity and fluency of Uyghur-Chinese two-way translation results are higher,which proves that local weight sharing can make full use of the commonality between Uyghur and Chinese languages.
Keywords/Search Tags:Uyghur-Chinese machine translation, Weakly parallel corpus, Bilingual dictionary, Language model, Local weight sharing
PDF Full Text Request
Related items