Font Size: a A A

Research About The Reordering In Hierarchical Phrase-based SMT

Posted on:2016-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:J Y ZhangFull Text:PDF
GTID:2308330476453321Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
This paper focuses on improving the reordering performance for hierarchical phrasebased SMT. We have three contributions as following:First, we propose a novel approach to prune function word alignment from an existing alignment model. The hierarchical phrase-based model is trained on aligned parallel corpus, so the alignment quality is very important to the translation performance.Since function words do not have clear correspondence between different languages,function word alignment depends more on context. So they usually cause poor aligning performance and incorrect function word alignments influence more on reordering performance. Based on monolingual and bilingual frequency characteristics, a languageindependent function word recognition algorithm is first proposed. For a function word alignment, if the content words syntactically related to this two function words are not aligned, this function word alignment will be pruned away. By improving function word alignment quality, our method can enhance the reordering for translation.Second, we introduce a simple and effective translation span learning model. If a phrase pair translation rule can be extracted from the aligned parallel corpus, then the source span covered by this rule is a translation span. In other words, translation span is the source span that translation system can apply translation rules to during decoding and applying a translation rule to a source span that is not a translation span will cause incorrect reorderings. This model is trained on aligned parallel corpus and then utilized for predicting translation spans for input sentences during translating. Our model is the first proposed model to directly learn translation span.Third, we design a well-developed word reordering model. A series of separate sub-models is used to reorder source word pairs with different distances. Experiments and analyses have shown that only sub-models for word pairs with short distances improve translation performance clearly. Compared with the previous method that learned reordering for all word pairs by one unified model, our model training is much more efficient. So we can use more sophisticated features and machine learning methods to learn reordering task better. Besides, considering multiple alignments, our model covers more word reordering patterns compared to the previous model.The later two models learn the reordering problem in different ways, which can be conveniently integrated into hierarchical phrase-based SMT as new features under a log-linear framework.In Chinese-to-English and Japanese-to-English translation tasks, all three methods improve translation performance significantly.
Keywords/Search Tags:Machine translation, word alignment, hierarchical phrase-based SMT, reordering, translation span, word reordering
PDF Full Text Request
Related items