Font Size: a A A

Discontinuous Phrase Template Extraction And Phrase Combination In Phrase-Based Statistical Machine Translation

Posted on:2008-03-04Degree:MasterType:Thesis
Country:ChinaCandidate:N DuanFull Text:PDF
GTID:2178360245493266Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Machine Translation (MT) is the use of a computer to translate texts or ulterances of a natural language into another natural language while maintaining the meanings unchanged. The process of MT is a decision problem where we have to decide on the best of target language text matching a source language text. During various kinds of different MT systems, Phrase-Based Statistical Machine Translation (SMT) is the best one undoubtfully.The Phrase-Based SMT approach allows for general many-to-many relations between words. Phrases which are extracted from alignment matrixs are listed in phrase translation table. Thereby, the context of words is taken into account in the translation model, and local changes in words order from source to target language can be learned explicitly. On the Chinese-English translation task, the Phrase-Based SMT obtains significantly better performance than the Single-Word-Based one.However, this approach also has some shortcomings at the same time. Due to the restriction of the allowed maximum length of a Chinese phrase, some fixed structures which are separated in a relative long distance can not be extracted as a whole unit. These structures devide in Chinese but their translations are continuous in English. What's more, the union of each part's translation is unequal the one which is obtained by translating the structure as a whole unit.We add discontinuous phrase templates and merged phrases in phrase translation table to enhance the quality of the Phrase-Based SMT. Extracted templates and merged phrases are learned from a bitext without any syntactic information. In this paper, we will introduce the algorithms of extraction and combination in details and take a series of comparative experiments using BLEU as a metric in 2002-2005 NIST test data. The evaluation results show that the quality of the translations achieves a relative improvement over the baseline Phrase-Based SMT.
Keywords/Search Tags:Phrase-Based SMT, Discontinuous Phrase Template, Phrase Combination, Phrase Translation Table
PDF Full Text Request
Related items