Font Size: a A A

Research On Dependency-based Statistical Machine Translation

Posted on:2012-06-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2218330368991828Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Syntax-based statistical machine translation is becoming an active research area in recent years. Compared to the traditional phrase-based one, the syntax-based model can integrate more linguistic knowledge and thus give more guidance during translation procedure.Based on a dependency-based translation model, this paper describes the design and implementation of a dependency-based statistical machine translation system. Furthermore, some important improvements on this system are proposed in this paper.First, a baseline system based on a dependency treelet-to-string translation model is built. In this model, translation pairs of source treelets and target strings with their word alignments are learned automatically from the parsed and aligned corpus. This model allows source treelets and target strings with variables so as to handle more dependency structures. In particular a chart-style decoding algorithm with two basic operations is designed to complete the translation process.Second, it focuses on template selection and attachment operation to attach uncovered nodes to treelets. For template selection, it proposes two methods. One is adding some distinguishing template features, such as variable number penalty feature and length penalty feature. Another is adding POStags into templates. Experiments show that applying the former method can improve BLEU score by 0.0081, while the later method can not improve the performance. However, after employing POStags to classify templates, it can improve the BLEU score by matching POStag-templates first. For attachment operation, it constructs a statistical-based attachment model to guide the order of target string sequence of uncovered nodes during the decoding process. First, attached instances are extracted from training corpus, and then, features are chosen to generate the model by a Maximum Entropy classifier. Experiments show that applying the attachment model can effectively control the sequence order of target string.On the NIST MT 2005 evaluation set, the great improvement is achieved by applying template selection and the attachment model simultaneously. The best result of this system by reserving unknown words can get an improvement of 0.0021 in BLEU than Moses, and achieve the BLEU score up to 0.2540 by removing unknown words, which shows the effectiveness of our methods.
Keywords/Search Tags:statistical machine translation, dependency grammar, translation model, template selection, attachment model
PDF Full Text Request
Related items