Font Size: a A A

On Decoding Algorithms For Phrase-Based Statistical Machine Translation

Posted on:2013-08-01Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhangFull Text:PDF
GTID:2298330467976200Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Since the1990s, Statistical Machine Translation (SMT) has received a great deal of attention and progress and become the hot spot in the field of Machine Translation. During this time, researchers have proposed lots of SMT models, including the word-based, the phrase-based, the hierarchical phrase-based and the syntax-based statistical machine translation models. The syntax-based SMT models can be further divided into three main categories, which include the tree-to-string, the string-to-tree and the tree-to-tree SMT models. Among these models, the phrase-based statistical machine translation model is the one most widely applied, because it not only achieves a higher translation performance in most cases, but also shows its robustness in translation tasks for many different language pairs.As other statistical models, the phrase-based statistical machine translation model is a data-driven model. It automatically learns translation knowledge from bilingual corpus and then applies it to translating new sentences. A fully developed phrase-based statistical machine translation system has many components, such as data pre-processor, the word alignment module, the phrase extraction module, the phrase scoring module, the decoder, the parameter optimizer and the data post-processor. Among these components, the phrase extraction and scoring modules are used to learn phrase translation table from bilingual corpus, the parameter optimizer is used to optimize model parameters, and the decoder is used to decode new source sentences. Generally, the system adopts BLEU as evaluation metric.It is obviously that the decoder is the core module for translation tasks in a machine translation system. To some extent, the decoder decides the translation accuracy and the decoding speed. So far, a lot of decoding algorithms which are suitable for the phrase-based statistical machine translation model have been investigated. Among them, the stack decoding algorithm, the Cocke-Younger-Kasami (CYK) decoding algorithm and the shift-reduce decoding algorithm are the most common ones. Regarding to the aspects of translation accuracy and decoding speed, the above decoding algorithms have their own different strengths and weaknesses. The stack decoding algorithm and the CYK decoding algorithm generally achieve higher translation accuracy, but have a relatively low decoding speed. By contrast, the shift-reduce decoding algorithm has a great advantage on decoding speed, but it degrades in translation accuracy. In this paper, we will describe them in detail and show our experiments of their comparisons in consideration of the translation accuracy and the decoding speed.As some applications demand much on both the translation accuracy and the decoding speed and the existed decoding algorithms are either too slow on decoding speed or too low in translation accuracy, we propose a hybrid decoding strategy which combines the CYK decoding algorithm and the shift-reduce decoding algorithm together. It uses the shift-reduce decoding algorithm to decode sub-sentences separated by punctuations, and adopts the CYK decoding algorithm to combine the translations of the sub-sentences to generate final translations for source sentences. Our experiments show that this hybrid decoding algorithm can well balance the demands on both the translation accuracy and the decoding speed for some applications.
Keywords/Search Tags:statistical machine translation, phrase-based statistical machine translation, decoding algorithms
PDF Full Text Request
Related items