Research On Bilingual Corpus-Based Machine Translation

Posted on:2009-08-10

Degree:Doctor

Type:Dissertation

Country:China

Candidate:W H Chao

Full Text:PDF

GTID:1118360278456618

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The research on machine translation has lasted a long time, but the quality has not reached the goal that the human beings have expected. However, with the rapid development of the computer technologies, and the improvement of the corpus construction, the machine translation based on the statistical knowledge becomes possible, and the quality of translation has the chance to get closer to the expectation of human beings. Since the noisy channel model, especially the maximum entropy model, for the machine translation have been proposed, one of the central tasks is to integrate more useful knowledge, especially linguistic knowledge, to improve the translation quality further. This paper focuses on the machine translation between the Chinese-English texts. And we make an in-depth and systematical research on how to incorporate the syntactic knowledge into the bilingual corpus-based machine translation , and implement a complete system in the end. In detail, the paper consists of the following topics:1. We propose a syntax-based word alignment.Word alignment is the base of the statistical machine translation, and its quality will take great effect on the quality of translation. Considering the problems faced in the Chinese-English word alignment, we propose an improved word alignment model, which introduces the syntactic knowledge to explain the flexible word order within the word alignment.By transforming the constraints, which is contained in the inversion transduction grammar implicitly, into some explicit position judgments, we introduce the ITG into the log-linear word alignment model in an effective way. Also, after designing some similarity metrics between the syntactic tree and the ITG tree, we integrated the syntactic knowledge into the ITG-based word alignment model, so that the model can constrain the complex word order within the word alignment.2. We propose a tree-tree statistical machine translation model.Because the word order is different between the source sentence and target sentence, one of the problems that should be solved in the SMT is the reorderings of the target words.We present a tree-tree SMT model in this paper. By mapping between the syntactic tree and the ITG tree, the model limits the reordering of the phrases in the global scope. While in the local scope, the tree-tree model takes an ITG-based local reordering model as one feature, in which the reordering probability of two blocks is decomposed into the product of the reordering probabilities of the child blocks respectively. So the model is able to estimate the reordering of two blocks with arbitrary lengths. By combining the global and local reordering model, the tree-tree model is able to explain the complex relationship between the source and target sentences.3. We propose a similar example retrieval approach based on bilingual information.When given similar translation examples, the example-based machine translation (EBMT) system will generate fluent translation. Thus, it is very important for the EBMT to retrieve the similar examples in the large scale corpus.In this paper, we propose a novel retrieval approach, which makes good use of the word alignment knowledge within the examples. In order to measure the similarity between the input sentence, which should be translated, and a translation example, we design a series of similarity metrics based on the word alignment within the example. These metrics improve the quality of retrieval. Also, we design a two-level inverted index table, to improve the efficiency of retrieval.4. We propose an example-based statistical machine translation model.The tree-tree SMT model above considers the source sentence only, and it tries to make the translation satisfy with the syntactic tree of the source sentence. So, it is unable to ensure that the structure of the target sentence is reasonable.We present a hybrid machine translation model, which expands the tree-tree model, combining the example knowledge into the SMT, to ensure the translation's fluency and consistency. In the same time, we present an example-based decoder, which makes use of both of the knowledge within the translation examples and the statistical knowledge, to improve the quality of translation.

Keywords/Search Tags:

Corpus, statistical machine translation, example-based machine translation, word alignment, reordering, tree-tree translation model, similar example retrieval, example-based statistical machine translation

PDF Full Text Request

Related items

1	Morphology-Processing In Chinese-Mongolian Statistical Machine Translation
2	A Stastical Machine Translation System Between Mongolian And Chinese
3	Study On Word Alignment Technology And Construction Of Statistical Machine Translation Platform
4	Research On Reording Model Of Tree-to-string Machine Translation
5	Researched On Mongolian-Chinese Statistical Machine Translation Based On String To Tree Translation Model
6	Implementation And Analysis Of Tree To String Alignment Template Model In Statistical Machine Translation
7	Alignment Based Acquisition Of Collocation And Application In Machine Translation
8	Research Of Optimization Methods Integration And Translation Rerank For Mongolian-chinese Machine Translation
9	The Research On English-Chinese Name Entity Translation
10	On Learning And Decoding Approaches To Tree-to-tree Statistical Machine Translation