Font Size: a A A

Research On Bilingual Corpus-Based Machine Translation

Posted on:2009-08-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:W H ChaoFull Text:PDF
GTID:1118360278456618Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The research on machine translation has lasted a long time, but the quality has not reached the goal that the human beings have expected. However, with the rapid development of the computer technologies, and the improvement of the corpus construction, the machine translation based on the statistical knowledge becomes possible, and the quality of translation has the chance to get closer to the expectation of human beings. Since the noisy channel model, especially the maximum entropy model, for the machine translation have been proposed, one of the central tasks is to integrate more useful knowledge, especially linguistic knowledge, to improve the translation quality further. This paper focuses on the machine translation between the Chinese-English texts. And we make an in-depth and systematical research on how to incorporate the syntactic knowledge into the bilingual corpus-based machine translation , and implement a complete system in the end. In detail, the paper consists of the following topics:1. We propose a syntax-based word alignment.Word alignment is the base of the statistical machine translation, and its quality will take great effect on the quality of translation. Considering the problems faced in the Chinese-English word alignment, we propose an improved word alignment model, which introduces the syntactic knowledge to explain the flexible word order within the word alignment.By transforming the constraints, which is contained in the inversion transduction grammar implicitly, into some explicit position judgments, we introduce the ITG into the log-linear word alignment model in an effective way. Also, after designing some similarity metrics between the syntactic tree and the ITG tree, we integrated the syntactic knowledge into the ITG-based word alignment model, so that the model can constrain the complex word order within the word alignment.2. We propose a tree-tree statistical machine translation model.Because the word order is different between the source sentence and target sentence, one of the problems that should be solved in the SMT is the reorderings of the target words.We present a tree-tree SMT model in this paper. By mapping between the syntactic tree and the ITG tree, the model limits the reordering of the phrases in the global scope. While in the local scope, the tree-tree model takes an ITG-based local reordering model as one feature, in which the reordering probability of two blocks is decomposed into the product of the reordering probabilities of the child blocks respectively. So the model is able to estimate the reordering of two blocks with arbitrary lengths. By combining the global and local reordering model, the tree-tree model is able to explain the complex relationship between the source and target sentences.3. We propose a similar example retrieval approach based on bilingual information.When given similar translation examples, the example-based machine translation (EBMT) system will generate fluent translation. Thus, it is very important for the EBMT to retrieve the similar examples in the large scale corpus.In this paper, we propose a novel retrieval approach, which makes good use of the word alignment knowledge within the examples. In order to measure the similarity between the input sentence, which should be translated, and a translation example, we design a series of similarity metrics based on the word alignment within the example. These metrics improve the quality of retrieval. Also, we design a two-level inverted index table, to improve the efficiency of retrieval.4. We propose an example-based statistical machine translation model.The tree-tree SMT model above considers the source sentence only, and it tries to make the translation satisfy with the syntactic tree of the source sentence. So, it is unable to ensure that the structure of the target sentence is reasonable.We present a hybrid machine translation model, which expands the tree-tree model, combining the example knowledge into the SMT, to ensure the translation's fluency and consistency. In the same time, we present an example-based decoder, which makes use of both of the knowledge within the translation examples and the statistical knowledge, to improve the quality of translation.
Keywords/Search Tags:Corpus, statistical machine translation, example-based machine translation, word alignment, reordering, tree-tree translation model, similar example retrieval, example-based statistical machine translation
PDF Full Text Request
Related items