Font Size: a A A

Research On Classical Chinese Neural Machine Translation With Limited Corpus

Posted on:2019-01-26Degree:MasterType:Thesis
Country:ChinaCandidate:X P WangFull Text:PDF
GTID:2428330590474190Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Nowadays,with the rapid development of living standards,more and more people have begun to turn to the pursuit of spiritual life.Classical Chinese is a gem of Chinese culture.The promotion of classical Chinese text has great benefits both in learning and in life.The purpose of this study is to use the machine model to automatically translate between classical Chinese and modern Chinese,making the use of classical Chinese more convenient.Due to the lack of parallel corpora of classical Chinese and modern Chinese,our research is how to establish a set of qualified classical Chinese language neural network machine translation model in the absence of corpus resources.Our research is mainly divided into the following two parts.The establishment of a parallel corpus of classical Chinese and modern Chinese.To build a translation model using a neural network requires a bilingual parallel corpus that has already been established.This study proposes a DSSAM(Deep Structured Semantic Alignment Model)method for building parallel corpora using a deep semantic matching model.The model uses the deep semantic matching model to achieve sentence alignment by synthesizing the semantics between the classical Chinese text and the modern Chinese translation,and finally establishes a bilingual parallel corpus.The study also improved the deep semantic matching model with Triplet Loss.In this paper,the results of alignment between semantic similarities between sentences are much better than previous methods based on sentence length ratio,alignment mode,and co-occurring word features.Training models using monolingual corpus and parallel corpora.Since the parallel corpus generated by the automatic generation is not enough to train the model,this paper proposes a model SCCT-NMT(Semi-supervision Classical Chinese Translation Neural Machine Translationg)that combines monolingual corpus training classical Chinese and modern language machine translation.It contains several design points: First,in the unique framework of this model,the classical Chinese and modern Chinese use the same encoder,while the decoding is two independent decoders,which can translate between two languages.Secondly,this paper makes special treatment for input.The participle of classical Chinese adopts the method of word alignment and auxiliary dictionary,and each input adds part-of-speech information.The third point is to propose the CPS-Attention(Content and Part of Speech Attention)mechanism combining the part of speech,at the time of calculation,introduces additional part of speech information to help the model understand the semantics of the sentence.Fourthly,at the decoding status,this study proposes a combination of replication mechanism and language model to assist decoding generation,helping to solve translations in classical Chinese and modern Chinese.At the time,the problem of co-occurrence of words and the generation of sentences is not fluent;the fifth point,when training for monolingual corpus,the method of noise reduction and reverse translation is used.In the end,the model in this paper can be improved compared to other machine translation models in the absence of a large number of parallel corpora.
Keywords/Search Tags:limited corpus, sentence alignment, neural network, classical Chinese translation
PDF Full Text Request
Related items