Integrating parsing and word alignment in syntax-based machine translation

Posted on:2011-08-18

Degree:Ph.D

Type:Thesis

University:University of Michigan

Candidate:Fossum, Victoria L

Full Text:PDF

GTID:2468390011972388

Subject:Computer Science

Abstract/Summary:

Training a state-of-the-art syntax-based statistical machine translation (MT) system to translate from a source language into a target language requires a large parallel corpus of example sentences in the source language translated into the target language by a human; a word alignment (word-to-word correspondence between each source-target sentence pair); and a parse tree (syntactic representation) of each sentence in the source language, target language, or both. From these resources, the string-to-tree syntax-based MT system used in this thesis [34, 33] acquires rules governing the process of translating a source string into a target parse tree. After training, these rules are used to translate previously unseen source sentences into the target language.;The parallel corpora used to train current state-of-the-art systems are too large for manual annotation; instead, word alignment and parsing must be performed automatically. There are two problems with current approaches to automatic word alignment and parsing. First, both processes introduce errors that propagate through the pipeline. Improving the accuracy of either process can therefore improve translation quality. Second, the two processes are typically performed independently. Since each process produces constraints that can be used to guide the other, we can improve the accuracy of both processes by integrating them more closely. Word alignment and parsing jointly determine the set of translation rules acquired by a system during training, so it is desirable to optimize them both in order to produce the best translation rules possible.;In this thesis, we address these two problems as follows. First, we recombine the output of multiple parsers, improving parse and translation quality. Second, we use features of the word alignment to correct parse errors. Third, we use features of the parse trees to correct word alignment errors, improving alignment and translation quality. Fourth, we integrate word alignment and parsing by producing n-best lists of candidates for each process, and discriminatively reranking (word alignment/parse tree) pairs to optimize the quality of the extracted translation rules.;Our results demonstrate that integrating word alignment and parsing improves the accuracy of each process, and in some cases improves translation quality relative to a state-of-the-art syntax-based MT system.

Keywords/Search Tags:

Translation, Word alignment, Syntax-based, Target language, Parsing, System, Each process, State-of-the-art

Related items

1	Modeling syntax for parsing and translation
2	Research On Syntax Parsing Using Neural Networks
3	Study On Word Alignment For Re-ordering Of Web-mined OOV Translation Candidates
4	Research On Word Alignment In Statistical Machine Translation
5	Bean Soup Translation: Flexible, Linguistically-motivated Syntax for Machine Translation
6	Low-Resource Machine Translation Techniques For Distant Language Pair
7	Research On Chinese Word Segmentation Strategies For Statistical Machine Translation
8	Research On Word Alignment And Its Use On Automatical Evaluation Technology Of Translation Quality
9	SQL Parsing And Translation Oriented To Database Security
10	Research Of Some Key Issues In Highly Adaptive Example-Based Machine Translation