Research On Dependency-based Statistical Machine Translation

Posted on:2012-06-24

Degree:Master

Type:Thesis

Country:China

Candidate:Y Zhang

Full Text:PDF

GTID:2218330368991828

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Syntax-based statistical machine translation is becoming an active research area in recent years. Compared to the traditional phrase-based one, the syntax-based model can integrate more linguistic knowledge and thus give more guidance during translation procedure.Based on a dependency-based translation model, this paper describes the design and implementation of a dependency-based statistical machine translation system. Furthermore, some important improvements on this system are proposed in this paper.First, a baseline system based on a dependency treelet-to-string translation model is built. In this model, translation pairs of source treelets and target strings with their word alignments are learned automatically from the parsed and aligned corpus. This model allows source treelets and target strings with variables so as to handle more dependency structures. In particular a chart-style decoding algorithm with two basic operations is designed to complete the translation process.Second, it focuses on template selection and attachment operation to attach uncovered nodes to treelets. For template selection, it proposes two methods. One is adding some distinguishing template features, such as variable number penalty feature and length penalty feature. Another is adding POStags into templates. Experiments show that applying the former method can improve BLEU score by 0.0081, while the later method can not improve the performance. However, after employing POStags to classify templates, it can improve the BLEU score by matching POStag-templates first. For attachment operation, it constructs a statistical-based attachment model to guide the order of target string sequence of uncovered nodes during the decoding process. First, attached instances are extracted from training corpus, and then, features are chosen to generate the model by a Maximum Entropy classifier. Experiments show that applying the attachment model can effectively control the sequence order of target string.On the NIST MT 2005 evaluation set, the great improvement is achieved by applying template selection and the attachment model simultaneously. The best result of this system by reserving unknown words can get an improvement of 0.0021 in BLEU than Moses, and achieve the BLEU score up to 0.2540 by removing unknown words, which shows the effectiveness of our methods.

Keywords/Search Tags:

statistical machine translation, dependency grammar, translation model, template selection, attachment model

PDF Full Text Request

Related items

1	Chinese-based Dependency Grammar - The Naxi Language Statistical Machine Translation Research
2	The Application Of Dependency Grammar In Chinese-to-English Statistical Machine Translation
3	Research On Dependency-to-String Model For Japanese To Chinese Statistical Machine Translation
4	Research Of Phrase-based Translation Model Using Syntactic And Morphologic Information
5	The Study On Phrase-Based Statistical Machine Translation System
6	Research Of Optimization Methods Integration And Translation Rerank For Mongolian-chinese Machine Translation
7	Implementation And Analysis Of Tree To String Alignment Template Model In Statistical Machine Translation
8	A Stastical Machine Translation System Between Mongolian And Chinese
9	Research On Synchronous Tree Substitution Grammar Based Statistical Machine Translation Methods
10	Research On Dependency-to-String Model For Chinese To English Example-Based Machine Translation