Font Size: a A A

Research And Implementation Of Chinese-Serbian Machine Translation Based On Deep Learning

Posted on:2022-10-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y X ZhangFull Text:PDF
GTID:2518306473988339Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years,with the deepening of bilateral relations between China and Serbia,the problem of communication barriers has become increasingly prominent.The demand for translators and the number of translators has formed a tense supply-demand relationship.The construction of Chinese-Serbian machine translation can effectively solve this problem.However,the establishment of this mapping relationship is often based on having a large amount of training corpus.In the process of constructing the Chinese-Serbian machine translation model,because Serbian(Serbian Cyrillic)belongs to a small language,the collection of Chinese-Serbian bilingual parallel corpus is difficult,and Serbian has rich morphological changes(For example,the number,tense,case,etc.)make the Serbian vocabulary sparse and severely,resulting in the asymmetry in the word frequency distribution of the Chinese-Serbian vocabulary,and ultimately lead to the failure of Chinese-Serbian machine translation to achieve better translation quality.This article uses deep learning as research background,combines with the academic achievements in the field of machine translation research in recent years.Aim at the scarcity of Serbian corpus and sparse vocabulary,this article proposes a semantic-related Chinese-Serbian machine translation model.The main works are as follows:(1)Research and analyze the development process of machine translation and related theories and evaluation technical indicators.(2)According to the characteristics that Serbian and Russian belong to the Slavic language family and have similarities in grammar,refer to the Chinese-Russian neural network machine translation model,determine the neural network structure suitable for Chinese-Serbian machine translation,and train of neural network.(3)In the process of data processing,a semantic correlation-based compression method is proposed for the Chinese vocabulary and the Serbian vocabulary to reduce the sparseness of the vocabulary and increase the neural network's understanding of semantics.(4)Propose a semantic correlation-based machine translation quality evaluation method(BLEU-ws),which is used as a temporary model selection,and compare it with the BLEU evaluation method in the experiment.(5)Optimize the trained model and implement an API interface that can be called from outside.Experiments proves that this article reduces the sparsity of Serbian vocabulary and overcomes the asymmetry of Chinese-Serbian word frequency distribution by compressing the vocabulary based on semantic correlation and using the BLEU-ws evaluation method.It can be selected during model training.The development of a translation model with strong semantic comprehension capabilities significantly improves the quality of Chinese-Serbian machine translation.Finally,the evaluation results of the Chinese-Serbian machine translation model show that the BLEU value of the optimal model obtained by the BLEU-ws evaluation method reached 20.8,which is 1.6 points higher than the BLEU value of the optimal model obtained by the BLEU evaluation method for model training,an increase of 8.3%.
Keywords/Search Tags:Machine Translation, Serbian, BLEU, Deep Learning
PDF Full Text Request
Related items