On Key Technologies For Phrase-Based Statistical Machine Translation

Posted on:2014-11-15

Degree:Master

Type:Thesis

Country:China

Candidate:Q Li

Full Text:PDF

GTID:2308330473953776

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Machine translation (MT) is a technology which has long been desired by human beings for hundreds of years. Ever since the computer was born in 1946, computer scientists and linguists have been dreaming of using computers to generate translation results between languages without any human forces. In recent two decades statistical models have been extensively investigated for MT, and presently the statistical models-based MT (SMT) systems have achieved state-of-the-art performance for many language pairs against other approaches to machine translation. Among all the SMT models, the phrase-based model is the simplest and the most effective one. In this thesis, we improve several key components of a state-of-the-art phrase-based SMT system.For phrase-based SMT, the phrase translation table, as one of the core components, is intended to solve the "word selection" problem. Currently, the process of building a phrase translation table follows a standard paradigm. A typical approach is to heuristically extract all possible phrases that are consistent with the word alignment. However, a straight-forward implementation of this approach probably produces an overabundant number of extracted phrases when we allow the extraction of phrases with arbitrary length. This thesis presents a new phrase extraction approach that recursively composes minimal phrases to learn a compact phrase table, referred to as composing-based phrase extraction method. Experimental results on Chinese-to-English translation demonstrate that the 2-composed method achieves translation performance comparable to typical phrase extraction method with the phrase table downsized by 44.3%.Another important SMT component is a decoder, which performs translation from a source-language sentence to its best target-language counterpart by using various resources, including the translation model, the reordering model, and the language model. Based on the analysis of the CYK algorithm for decoding, we present an optimized cube pruning method which greatly reduces the time and space complexity, and improves the translation speed with comparable translation performance against the baseline.When analyzing the translation results, we further find that many notional words were deleted in the framework of the statistical translation. In this thesis, we add some new features into log-linear model to alleviate this problem.After generating the translation results, the raw translations need to be recased and detokenized which we call post-processing. In this thesis, we present a new recasing method for English sentences, which can be easily implemented in a left-to-right fashion and generate high-quality recasing results.All in all, in this thesis we discussed the key techniques for phrase-based statistical machine translation, including the translation model, the decoder, and the post-processing module, and proposed efficient optimization techniques.

Keywords/Search Tags:

statistical machine translation, phrase-based statistical machine translation, phrase extraction, decoder, post-processing

PDF Full Text Request

Related items

1	Research On Phrase-based Statistical Machine Translation
2	Study On Several Key Problems In The Training Process Of Phrase-based Statistical Machine Translation
3	Translation Knowledge Acquisition In Corpus-based Machine Translation
4	The Study On Phrase-Based Statistical Machine Translation System
5	Research And Implementation Of Hierarchical Phrase-based Translation Model In Statistical Machine Translation
6	Research And Implementation Of Hierarchical Phrase-Based Translation Model In Statistical Machine Translation
7	The Design And Realization Of A Phrase-based Statistical Chinese-English MTS
8	The Research And Application Of Phrase-Based Statistical Machine Traslation System
9	The Research Of Phrase Extraction Technology For Tibetan And Chinese Statistical Machine Translation
10	Research On Some Key Aspects Of Statistical Machine Translation