Research About The Reordering In Hierarchical Phrase-based SMT

Posted on:2016-04-28

Degree:Master

Type:Thesis

Country:China

Candidate:J Y Zhang

Full Text:PDF

GTID:2308330476453321

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

This paper focuses on improving the reordering performance for hierarchical phrasebased SMT. We have three contributions as following:First, we propose a novel approach to prune function word alignment from an existing alignment model. The hierarchical phrase-based model is trained on aligned parallel corpus, so the alignment quality is very important to the translation performance.Since function words do not have clear correspondence between different languages,function word alignment depends more on context. So they usually cause poor aligning performance and incorrect function word alignments influence more on reordering performance. Based on monolingual and bilingual frequency characteristics, a languageindependent function word recognition algorithm is first proposed. For a function word alignment, if the content words syntactically related to this two function words are not aligned, this function word alignment will be pruned away. By improving function word alignment quality, our method can enhance the reordering for translation.Second, we introduce a simple and effective translation span learning model. If a phrase pair translation rule can be extracted from the aligned parallel corpus, then the source span covered by this rule is a translation span. In other words, translation span is the source span that translation system can apply translation rules to during decoding and applying a translation rule to a source span that is not a translation span will cause incorrect reorderings. This model is trained on aligned parallel corpus and then utilized for predicting translation spans for input sentences during translating. Our model is the first proposed model to directly learn translation span.Third, we design a well-developed word reordering model. A series of separate sub-models is used to reorder source word pairs with different distances. Experiments and analyses have shown that only sub-models for word pairs with short distances improve translation performance clearly. Compared with the previous method that learned reordering for all word pairs by one unified model, our model training is much more efficient. So we can use more sophisticated features and machine learning methods to learn reordering task better. Besides, considering multiple alignments, our model covers more word reordering patterns compared to the previous model.The later two models learn the reordering problem in different ways, which can be conveniently integrated into hierarchical phrase-based SMT as new features under a log-linear framework.In Chinese-to-English and Japanese-to-English translation tasks, all three methods improve translation performance significantly.

Keywords/Search Tags:

Machine translation, word alignment, hierarchical phrase-based SMT, reordering, translation span, word reordering

PDF Full Text Request

Related items

1	A Study On Reordering Issues Of Phrase-Based Statistical Machine Translation
2	Alignment Based Acquisition Of Collocation And Application In Machine Translation
3	Research On The Key Technologies For Phrase-based Statistical Machine Translation Models
4	Research On Reording Model Of Tree-to-string Machine Translation
5	Research On Bilingual Corpus-Based Machine Translation
6	The Research Of Uygur And Chinese Machine Translation System Based On The Security Field
7	Research On Reordering Problems Of Hierarchical Phrase-Based Translation Model
8	Research On Reordering Problems Of Hierarchical Phrase-based Translation Model
9	Study On Several Key Problems In The Training Process Of Phrase-based Statistical Machine Translation
10	Low-Resource Machine Translation Techniques For Distant Language Pair