Contributions to research on machine translation

Posted on:2007-04-21

Degree:Ph.D

Type:Thesis

University:University of California, San Diego

Candidate:Kauchak, David

Full Text:PDF

GTID:2455390005486402

Subject:Artificial Intelligence

Abstract/Summary:

In the past few decades machine translation research has made major progress. A researcher now has access to many systems, both commercial and research, of varying levels of performance. In this thesis, we describe different methods that leverage these pre-existing systems as tools for research in machine translation and related fields.; We first examine techniques for improving a translation system using additional text. The first method uses a monolingual corpus. Discrepancies are identified by translating a word list to a foreign language and back again. Entries where the original word and its double translation differ are used to learn word-level correction rules. The second method uses parallel bilingual data consisting of source language/target language sentence pairs. The source sentences are translated using a translation system, and a partial alignment is identified between the machine-translated sentences and the corresponding human-translated sentences in the target language. This alignment is used to generate phrase-level correction rules. Experimentally, both word-level and phrase-level correction rules result in improved translation performance. The learned word-level correction rules make 24,235 corrections on 20,000 Spanish to English translated sentences, with high accuracy. The learned phrase-level rules improve the translation performance (as measured by BLEU) of a French to English commercial system by 30%, and of a state of the art phrase-based system in a statistically significantly way.; To train current statistical machine translation systems, bilingual examples of parallel sentences are used. Generating this data is costly, and currently feasible only in limited domains and languages. A fundamental question is whether every potential example is equally useful. We describe a ranking method for examples that scores individual sentence pairs based on the performance of translation systems trained on random subsets of the examples. When used to train a translation system, the top ranking examples result in a significantly better performing system than random selection of examples. Given these ranked examples, a model of example usefulness can potentially be learned to select the most useful unlabeled examples. Initial experiments show two previously used example features are good candidates for identifying useful examples.; In the last part of this thesis we describe how automatic paraphrasing methods can be used to improve the accuracy of evaluation measures for machine translation. Given a human-generated reference sentence and a machine-generated translated sentence, we present a method that finds a paraphrase of the reference sentence that is closer in wording to the machine output than the original reference is. We show that using paraphrased reference sentences for evaluating a translation system output results in better correlation with human judgement of translation adequacy than using the original reference sentences.

Keywords/Search Tags:

Translation, System, Sentences, Reference, Correction rules, Using, Examples

Related items

1	Incorrect Sentences Analysis And Self-correction
2	Translation Report Of Rules For The Global Economy (Excerpts): Long Sentences Translation
3	A Report On The Translation Of Long Sentences In FAA Compliance And Enforcement Program
4	Mojiang Hani Card More Than If The Reference Syntax
5	Translation Report For Federal Rules Of Evidence In A Nutshell(Article 1 To 2)
6	A Contrastive Study On Interventional Correction And Introspective Correction
7	A Contrastive Study Of Reference Between English And Chinese And Implications For Translation
8	A Report On Translation Of Letters Of A Traveller(the First Six Letters)
9	Translation Practice Report On Prelude And Chapter 1 Of Japanese Health Care System And Policy
10	The Translation Of Cultural Difference And Long Sentences In Sociological Works On Law With Reference To Entrepreneurial Litigation Translation