Font Size: a A A

Contributions to research on machine translation

Posted on:2007-04-21Degree:Ph.DType:Thesis
University:University of California, San DiegoCandidate:Kauchak, DavidFull Text:PDF
GTID:2455390005486402Subject:Artificial Intelligence
Abstract/Summary:
In the past few decades machine translation research has made major progress. A researcher now has access to many systems, both commercial and research, of varying levels of performance. In this thesis, we describe different methods that leverage these pre-existing systems as tools for research in machine translation and related fields.; We first examine techniques for improving a translation system using additional text. The first method uses a monolingual corpus. Discrepancies are identified by translating a word list to a foreign language and back again. Entries where the original word and its double translation differ are used to learn word-level correction rules. The second method uses parallel bilingual data consisting of source language/target language sentence pairs. The source sentences are translated using a translation system, and a partial alignment is identified between the machine-translated sentences and the corresponding human-translated sentences in the target language. This alignment is used to generate phrase-level correction rules. Experimentally, both word-level and phrase-level correction rules result in improved translation performance. The learned word-level correction rules make 24,235 corrections on 20,000 Spanish to English translated sentences, with high accuracy. The learned phrase-level rules improve the translation performance (as measured by BLEU) of a French to English commercial system by 30%, and of a state of the art phrase-based system in a statistically significantly way.; To train current statistical machine translation systems, bilingual examples of parallel sentences are used. Generating this data is costly, and currently feasible only in limited domains and languages. A fundamental question is whether every potential example is equally useful. We describe a ranking method for examples that scores individual sentence pairs based on the performance of translation systems trained on random subsets of the examples. When used to train a translation system, the top ranking examples result in a significantly better performing system than random selection of examples. Given these ranked examples, a model of example usefulness can potentially be learned to select the most useful unlabeled examples. Initial experiments show two previously used example features are good candidates for identifying useful examples.; In the last part of this thesis we describe how automatic paraphrasing methods can be used to improve the accuracy of evaluation measures for machine translation. Given a human-generated reference sentence and a machine-generated translated sentence, we present a method that finds a paraphrase of the reference sentence that is closer in wording to the machine output than the original reference is. We show that using paraphrased reference sentences for evaluating a translation system output results in better correlation with human judgement of translation adequacy than using the original reference sentences.
Keywords/Search Tags:Translation, System, Sentences, Reference, Correction rules, Using, Examples
Related items