Font Size: a A A

Design And Implementation Of Chinese Tibetan Translation System For Primary And Secondary School Educational Resources

Posted on:2022-11-16Degree:MasterType:Thesis
Country:ChinaCandidate:W YanFull Text:PDF
GTID:2517306746451944Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the adjustment and change of the national minority work governance concept,it is particularly important to popularize the national common language in ethnic minority areas,so that Tibetan students can learn the national common language subtly,consciously and naturally,and use the high-quality resources of Chinese online education to improve them It is a meaningful work.Machine translation technology is also an important research content in natural language processing.At present,the mainstream machine translation technology is neural machine translation based on deep learning.The main advantage of neural machine translation is that it does not require feature engineering,and needs to learn language features from large-scale corpora to complete the translation task.This thesis studies Chinese-Tibetan neural machine translation technology,realizes the translation of Chinese educational text resources into Tibetan,and then presents the subtitles of educational resource videos in Chinese and Tibetan for students to watch and learn.In the current epidemic environment,students' normal classes are affected,while the videos of the existing mainstream education websites are not subtitled.This work can promote the dual popularization of the national common language language and characters,improve students' cultural level,promote the exchange and integration of Chinese and Tibetan cultures,and forge the awareness of the Chinese national community.The main work of this thesis is as follows:(1)Corpus construction.Neural machine translation technology requires a large amount of data sets,and Chinese-Tibetan bilingual corpora is relatively scarce.This thesis formulates corpus processing standards,including corpus acquisition methods,data storage formats,processing tool selection,etc.,and proposes processing procedures to standardize processing and build a Chinese-Tibetan neural translation corpus.(2)Data augmentation.Tibetan is a low-resource corpus,and the existing parallel corpus cannot meet the number required for neural machine translation.In this thesis,the back-translation method and the low-frequency word replacement method are combined,and the grammar error correction module is used to enhance the data.The enhanced data is compared with the original data,and the results verify that the data augmentation method used in this thesis can effectively enhance the training results of the corpus.(3)Neural machine translation models fused with pretrained models.The pre-trained language model has achieved great improvement in machine translation tasks.In this thesis,the autoregressive language model ELMO is integrated with the Bi RNN+Attention model.After splicing the output of ELMO with the embedding matrix,it is sent to the model for training,so that the model can learn more Multi-sentence information,thereby improving the translation quality of the model.The experimental results show that compared with the Bi RNN+Attention model and the Transformer model,the fusion pre-training model improves the BLEU value by1.95 and 1.2,respectively.(4)Design and implementation of educational resource translation system.The translation model is applied to the field of education to realize the system,so that users can experience scientific and technological achievements more humanized.First,obtain the open source education video and extract the video text.Secondly,realize the Chinese Tibetan translation through the translation model.Finally,the video with Chinese Tibetan Bilingual subtitles is transmitted to the education resource sharing platform.Students enter the website according to their own needs and watch and study selectively.
Keywords/Search Tags:Corpus construction, Data enhancement, Neural machine translation, Educational resource translation system
PDF Full Text Request
Related items