| Ancient Chinese carries the great wisdom and ideology of China.However,there is a big obstacle for people to read ancient Chinese articles.Many ancient Chinese articles are lack of translation and the cost of manual translation of ancient Chinese is too high.With the development of the Internet and artificial intelligence,research on machine translation continue to appear.Therefore,applying the machine translation models to ancient Chinese articles translation is of great significance to the digitization of ancient Chinese articles translations.Currently,there is a lack of research on ancient Chinese machine translation tasks and high-quality and large-scale parallel corpus dataset,leading to ancient Chinese machine translation research developing slowly.In addition,we should pay more attention to the research of document-level ancient Chinese machine translation models,but most document-level machine translation models have not been widely applied.To promote the digitization of ancient Chinese articles translation,this dissertation proposes several high-quality datasets for ancient Chinese machine translation tasks,and two Transformer-based ancient Chinese machine translation models.In summary,the core contributions of this dissertation are as follows:·An ancient Chinese dictionary dataset that contains 3930 characters,a sentence-level parallel corpus dataset that contains 1.37 M bilingual sentence pairs and a document-level translation corpus dataset that contains 48 K bilingual paragraph pairs of ancient-modern Chinese were constructed.We first obtained an open-source ancient Chinese dictionary and structure it,constructing the ancient Chinese dictionary dataset.Meanwhile,we also obtained many ancient-modern Chinese bilingual articles.After the manual paragraph alignment and sentence alignment,we extended the datasets by merging adjacent corpus and got a sentence-level parallel corpus dataset that contains 1.37 M bilingual sentence pairs and a document-level translation corpus dataset that contains 48 K bilingual paragraph pairs of ancient-modern Chinese.·This dissertation proposed a three-stage machine translation model which incorporates ancient Chinese dictionary information for ancient Chinese machine translation task.Based on the Transformer model,we added a Dictionary module between the Encoder and the Decoder,incorporating the word interpretation from the ancient Chinese dictionary by attention mechanism,making the three-stage model check the dictionary before translating.This three-stage machine translation model is more like human translation method,which is reading the original sentence,consulting the dictionary,and confirming the translation,breaking the traditional encoder-decoder framework.Finally,experiments based on out dataset validated the effectiveness of the model.·We further proposed a document-level ancient Chinese machine translation model that incorporates context information through sparse attention mechanism.Based on the Transformer model,we considered the importance of context information in ancient Chinese machine translation task and incorporated different kind of context information into the model through sparse attention mechanism,improving the model performance and keeping the cost of time and calculate power.This model decomposes the traditional long sequence text translation task into sentence-by-sentence translation task,so that it can handle ancient Chinese machine translation task of articles with arbitrary length.Finally,experiments based on our dataset validated the effectiveness of the model.In conclusion,this dissertation focused on ancient Chinese machine translation task.Firstly,we constructed an ancient Chinese dictionary dataset that contains 3930 characters,a sentence-level parallel corpus dataset that contains 1.37 M bilingual sentence pairs and a document-level translation corpus dataset that contains 48 K bilingual paragraph pairs of ancient-modern Chinese.Secondly,we incorporated the ancient Chinese dictionary information through attention mechanism and proposed a three-stage machine translation model.Meanwhile,we also incorporated context information through sparse attention mechanism and proposed a document-level machine translation model which can handle articles with arbitrary length.Finally,experiments based on our datasets validated the effectiveness of the models. |