Font Size: a A A

Research On Chinese-Thai Neural Machine Translation Method Based On Unsupervised Syntactic Structure Learnin

Posted on:2023-07-09Degree:MasterType:Thesis
Country:ChinaCandidate:H T ZhangFull Text:PDF
GTID:2555306797473254Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the proposal and development of China’s "One Belt,One Road",the communication and exchanges between my country and Thailand have become increasingly close,and machine translation between Chinese and Thai has a high application demand.In the existing machine translation research,large-scale and highquality parallel corpus is an important condition.Thai,as a low-resource language,lacks large-scale Chinese-Thai parallel sentence pairs,which largely hinders the development of Chinese-Thai neural machine translation.It is largely affected by the scarcity of Chinese-Thai parallel corpora.Incorporating the knowledge of syntactic structure can make the translation result more in line with syntactic constraints and make up for the lack of massively parallel corpus.However,traditional syntactic knowledge acquisition methods often rely on large-scale annotated corpora,and Thai lacks syntactic annotation training data,making it difficult to fully train a syntactic parsing model.In response to the above problems,this paper proposes a Chinese-Thai neural machine translation method based on unsupervised syntactic structure learning,and conducts the following researches:(1)Aiming at the lack of mature dependency parsing tools and large-scale dependency annotation corpus in Thai,this paper proposes an unsupervised Thai dependency parsing method based on dynamic word embedding alignment.The method uses the rich annotated corpus of other languages to train the syntactic parser,and acquires the syntactic structure knowledge of Thai unsupervised by means of transfer learning.First obtain the monolingual dynamic word vector representation of Thai and other highresource languages,and optimize the word vector representation by clustering,then unsupervised implement monolingual word embedding alignment,pre-train the syntactic parsing model with the training data of the resource-rich language,and use the obtained alignment matrix to implement the Thai syntactic analysis.The experimental results show that the method based on dynamic word embedding alignment proposed in this paper can effectively achieve unsupervised Thai dependency parsing.(2)Aiming at the gap between the unsupervised acquisition of Thai dependency syntactic information and standard syntactic knowledge,which leads to overfitting of noisy syntactic information in the translation process,this paper proposes a dependency distance penalty mechanism,which improves the Chinese-Thai machine translation method based on dependency-aware syntactic information fusion.By assigning different weights to different parent word distance relationship matrices,this method makes the fusion model favor the distance relationship features between tags with direct dependencies,so as to alleviate the problem that the noise in the syntactic information affects the translation effect not obvious.The experimental results show that,compared with the traditional Transformer model,the method in this paper can effectively incorporate syntactic knowledge and improve translation performance.(3)Realization of a Thai-Chinese neural machine translation prototype system based on unsupervised syntactic structure learning.This paper designs and implements a ThaiChinese neural machine translation prototype system incorporating structural information.The system can realize the dependency annotation of the Thai unlabeled corpus,And can achieve mutual translation between Chinese and Thai.At the same time,based on the purpose of convenient use of the system,a visual operation interface is also provided.The method in this paper realizes the unsupervised acquisition of Thai-dependent syntactic structure knowledge and the UAS accuracy rate is close to 50%;It alleviates the interference of syntactic information noise on the translation effect,effectively integrates syntactic information into the translation model and improves the accuracy of ThaiChinese neural machine translation.
Keywords/Search Tags:Neural machine translation, Thai-Chinese, Unsupervised, Syntax learning, Dependency parsing
PDF Full Text Request
Related items