The Research On The Technology Of Statistical-Based Chinese-English Machine Translation

Posted on:2007-01-08

Degree:Master

Type:Thesis

Country:China

Candidate:J Wei

Full Text:PDF

GTID:2178360215470085

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In the last decade, the statistical approach has found widespread use in machine translation both for written and spoken language and has had a major impact on the translation accuracy. We have studied the principles of statistical machine translation and the technology made so far, then we designed and implemented a prototype system of statistical-based Chinese-English statistical translation.Our works included two parts: one is the research on the statistical machine translation based on the word alignment model, the other is the research based on the phrase alignment model.Our first work is based on the source-channel model which is most widely used about statistical-based machine translation. There are five IBM models. It has been confirmed that IBM Model 4 produces a better alignment quality in comparison with other IBM alignment models. So we developed a prototype system of Chinese-English machine translation based on IBM model 4, which includes the following works: constructing the statistical translation model, building the language model, and implementing a decoder to find the best translations. There are the following works:1) When constructing the Chinese-English statistical translation model, we integrated the pos-tag information and have tested that the improved model has a better alignment quality and better translations than the model with monolingual word clustering.2) One of the most important tasks is to construct a decoder. We have studied the existing search algorithms such as stack search, beam search, greedy search and A~* search algorithm in the statistical machine translation. By comparison we adopted the DP-based beam search and A~* search algorithm. We have tested the A~* search and beam search in Chinese-English machine translation and found that the A~* search algorithm plays better.3) A~* search only extends the best node, which may lead to a very wrong direction in the Chinese-English translation, because the distinctions between English and Chinese are very great. So we introduced the partial breadth search to enlarge the search scope as well as made the heuristic strategy for the selection of the added nodes. The experimental result shows that our method can achieve a better quality and efficiency.4) In Chinese-English machine translation, we had to concerning the bad impacts of the empty word. Because the existing formula of the computing the empty word is not adaptable to Chinese-English machine translation, we had to revise it. Moreover, we set the parameters affecting the translations by experiments.Our second work is based on the first work. Because the word alignment model does not take into account the context in which both the source and the target words appear. And it has many deficiencies; most research on the statistical machine translation is now turning to the research of the phrase-based alignment model. We have done the following works:1) We combined the Viterbi alignment through training based on the IBM model4 with the alignment through ISA algorithm; we have achieved a higher accuracy rate for the word alignment of the training corpus after the test.2) When integrating ISA, we set the formula to compute the MI (Point-wise Mutual Information) as well as the threshold. 3) We present to construct the alignment template using pos tag information so as to take into account the word context, for the word alignment model ignores the word context.4) Because of the high accuracy rate of the word alignment, we extracted the phrase pairs from the training corpus. We used translation memory method when decoding. Through experiments we have found that the efficiency and accuracy of translation have increased.5) We have used the alignment template, which can be involved the context of the sentence.6) How to estimate the translation quality? We used the IBM model4 formula to compute the score of the translations.Above all, we have achieved a more sound translation model through integrating phrase alignment model, getting a better translation quality in contrast to the statistical machine translation based on the word alignment model.

Keywords/Search Tags:

statistical machine translation, Chinese- English machine translation, translation model, alignment model, decoder, search algorithm, A~* search algorithm, DP-based algorithm, beam search algorithm, pos tag phrase-based alignment model, ISA

PDF Full Text Request

Related items

1	The Research On The Decodeing Algorithm In Statistical Machine Translation
2	The Study On Phrase-Based Statistical Machine Translation System
3	The Research On English-Chinese Name Entity Translation
4	Implementation And Analysis Of Tree To String Alignment Template Model In Statistical Machine Translation
5	Morphology-Processing In Chinese-Mongolian Statistical Machine Translation
6	Study On Several Key Problems In The Training Process Of Phrase-based Statistical Machine Translation
7	Principle Research And Frame Design On English-Chinese Machine Translation System
8	Study On Word Alignment Technology And Construction Of Statistical Machine Translation Platform
9	Alignment Based Acquisition Of Collocation And Application In Machine Translation
10	Research And Implementation Of Hierarchical Phrase-Based Translation Model In Statistical Machine Translation