Font Size: a A A

Research On Tibetan-Chinese Machine Translation Under The Condition Of Sparse Resources

Posted on:2020-05-12Degree:MasterType:Thesis
Country:ChinaCandidate:J D Z SangFull Text:PDF
GTID:2438330578464434Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Machine Translation is the process of converting a natural language into another nat-ural language using a specific computer program.Since the idea of machine translation was put forward in the 1950s,the research in the field of machine translation has expe-rienced many iterations of theories and techniques from rules,statistics to deep learning.It is the most active research direction in the field of artificial intelligence.As one of the main topics in the field of Tibetan natural language processing,Tibetan-Chinese machine translation research has always been the development of computer science and technology and information society in China.This paper focuses on the data sparse issue of Tibetan-Chinese machine transla-tion research.Based on the Transformer translation model,a large-scale iterative back-translation strategy and automatic translation mechanism of millions of sentence mono-lingual data are proposed and benchmarked several strong baseline models indicating The model has an increase in four BLEU values,confirming the validity of the back-translation method.In addition,the thesis also implements the traditional Chinese phrase-based sta-tistical translation model and three types of baseline models based on mainstream neural network architectures,and the segmentation of neural network-oriented Chinese-language machine translation.The main contributions of this paper include:·Studied and analyzed the segmentation methods of Tibetan-Chinese machine trans-lation for neural networks,and confirmed the word-based method experimentally.The best performance of the coded subword segmentation model.·Implemented an end-to-end classifier for distinguishing Tibetan-Chinese sentences in translation equivalence.·Combined the translation equivalence classifier and the large-scale dual iterative back-translation strategy to achieve the sparse Tibetan-Chinese translation.An ef-fective model for improving the translation performance of neural network ma-chines using monolingual data under-resource conditions.
Keywords/Search Tags:Tibetan and Chinese, Machine Translation, Statistical Learning, Neural Networks, Back Transation, Data Sparsity
PDF Full Text Request
Related items