Font Size: a A A

A Study Of Chinese-Vietnamese Statistical Machine Translation Methods That Combines Language Differences

Posted on:2018-09-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2358330518460430Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Vietnam is an important country in Southeast Asia and borders with China,and has frequent political and economic exchanges with china.Machine Translation is one of the important branches of Natural Language Processing research.The study of Chinese to Vietnamese statistical Machine Translation has an important supporting role in bilingual understanding,information retrieval,cultural exchange and economic trade.The current Chinese to Vietnamese translation model is still in the initial stage,the main work focuses on the bilingual parallel corpus construction,word alignment methods and Vietnamese dependency tree etc..The language features of Vietnamese and Chinese have both similarities and differences.The same point follow SVO structure,the difference is that the Vietnamese modifier(attributive and adverbial modifiers etc.)and the rear position and Chinese relations,namely in Vietnamese noun words in the modification,after the adverbs in the modified adjectives and verbs.Based on the above analysis,the hierarchical model and the syntax tree to tree model are used to model and study the linguistic differences:(1)a hierarchical phrase-based translation model with linguistic differences in lexicalization models.First of all,using the Chinese language part of speech tagging and word segmentation tools and Vietnamese word segmentation tools for Chinese and Vietnamese bilingual parallel sentence segmentation and tagging,through the GIZA++ to get word alignment information.Then,the initial phrase pairs are extracted by using the word alignment information,and then the rules are generalized to a non terminating rule.Secondly,through the analysis of the differences between Chinese and Vietnamese,the formal definition of the language features,and it is integrated into the lexicalized order model.Decoding of CKY algorithm.In the experiment,observing the level of the phrase translation model fusion language difference lexicalized model,and comparing the conventional model in the hierarchical phrase-based language model under different grammars,the experimental results show that the lexicalized model of hierarchical phrase-based translation model fusion language difference improves the translation effect.(2)the Chinese and Vietnamese statistical Machine Translation method based on syntactic tree and tree translation model.The first step is to parse the syntax tree,to generate the bilingual syntax tree,and then get the word alignment by GIZA++.By using the rich phrase pairs of phrase translation model,the parse tree of the source language and the target language is generalized to expand the rule base.The second step is to improve the rule preprocessing and translation model by using the effective language differences.The decoding process uses the tree parsing algorithm,and uses the generalization of the target language to guide the candidate translation generation.In the experiment,we observed the BLUE value of the tree level to the tree model,which was used to observe the difference of the phrase level,the syntax tree to the tree,and the language characteristic.The experimental results show that the proposed method can effectively improve the size of the rule base and improve the accuracy of the translation.(3)the prototype system of Chinese syntactic tree to tree translation model.Based on the syntax tree to tree translation system,the modeling phase characteristics of language differences between Chinese and Vietnamese as a feature in the optimization of rule base and the translation model,the system builds some open source tools and frameworks used in the process,such as the Niutrans translation framework,word segmentation and annotation tool academy,GIZA++ and so on.The foreground of the system is constructed by using the Java Servlet technology,and the translated sentences are translated by the translation model.Finally,a prototype system of the Chinese syntactic tree and the translation model is constructed.
Keywords/Search Tags:statistical machine translation, Chinese and Vietnamese, language characteristics, hierarchical phrase extraction, lexicalized model, tree to tree model, rule generalization
PDF Full Text Request
Related items