Font Size: a A A

Research On Synchronous Tree Substitution Grammar Based Statistical Machine Translation Methods

Posted on:2011-07-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:H F JiangFull Text:PDF
GTID:1118330338989433Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Machine translation has been studied for more than fifty years. Currently, the dominant research direction of machine translation is statistical based approach. In the last two decades, the statistical based machine translation researches have evolved from the classic word-based models, the fairly mature phrase-based models, to the syntax-based models. Compared with the previous models, the syntax-based machine translation models have many potential advantages. For instance, they have the abil-ity to effectively model the long distance reordering (or global reordering, structure reordering), stronger generalization ability and the ability to elegantly model the dis-contiguous phrase correspondence etc.However, the syntax-based machine translation models are far from perfect. There are many open issues need to be addressed. In this thesis, we put emphasis on several key issues of existing in current syntax-based models and try to propose some solutions for them. Specifically, the major contributions of this thesis are listed as follows:(1) The synchronous tree substitution grammar based statistical machine translation modelFacing with the shortcomings of the state-of-the-art phrase-based statistical machine translation models and the synchronous context-free grammar based syn-tax models, we present a Synchronous Tree-Substitution-Grammar based translation model (STSG). Phrase-based models can learn local reordering, translation of short idioms that are common enough to be observed in training data. However, the phrase model is incapable of modeling global reordering and discontinuous phrases due to the lack of structure transformation information. The synchronous context-free gram-mar based syntax models only allow the reordering among the sibling nodes. Thus, it is unable to effectively handle the non-isomorphic tree structure corresponding. The STSG-based model can elegantly model the global reordering and discontinu-ous phrases. Furthermore, it can learn non-isomorphic tree-to-tree mappings since the reordering between the non-sibling nodes can be effectively modeled. Experiments on two different data sets show that the proposed model significantly outperforms one phrase-based model and one synchronous context-free grammar based model.(2) Syncretize the non-syntactic translation equivalentWhile syntax-based models have the potential to model the structure reorder-ing and discontiguous phrase correspondence, they suffer from the strictly syntactic constraints. To address these constraints and integrate the advantages of phrase-based models into syntax-based models, a Synchronous Tree Sequence Substitution Gram-mar (STSSG) based SMT model is presented in this dissertation. This novel model uses the tree sequence as the basic translation unit. Therefore, both the syntactic trans-lation equivalences and the non-syntactic translation equivalences equipped with syn-tactic information can be utilized in the translation. Experimental results on the NIST 2005 Chinese-English machine translation data-set show that the proposed method achieves significant improvements over baseline methods including a phrase-based model and a tree-based syntax model.(3) Synthetic Synchronous Grammar based Machine TranslationTo combine the advantages of different synchronous grammars, we present a synthetic synchronous grammar which syncretizes a formally synchronous context-free grammar and a linguistically synchronous tree sequence substitution grammar for statistical machine translation. The translation model based on this novel grammar can enlarge the translation hypothesis space by exploring heterogeneous derivations. By this way, the system performance can be improved significantly.(4) Syntactic Rule Taxonomy and Contribution AnalysisThe translation rule taxonomy is also investigated in this dissertation. First, the existing rule classifications are discussed. Then, we present a comprehensive tax-onomy for the translation rules from several different points of view. Based on two typical syntax-based translation models, we carry out thoroughly studies to empiri-cally examine the contributions of different rule categories. We also design a metric to measure the cost effectiveness of different rule categories.
Keywords/Search Tags:statistical machine translation, synchronous tree substitution grammar, non-isomorphic correspondence modeling, syntactic constraint, translation rule taxonomy
PDF Full Text Request
Related items