Font Size: a A A

The Research On The Technology Of Statistical-Based Chinese-English Machine Translation

Posted on:2007-01-08Degree:MasterType:Thesis
Country:ChinaCandidate:J WeiFull Text:PDF
GTID:2178360215470085Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the last decade, the statistical approach has found widespread use in machine translation both for written and spoken language and has had a major impact on the translation accuracy. We have studied the principles of statistical machine translation and the technology made so far, then we designed and implemented a prototype system of statistical-based Chinese-English statistical translation.Our works included two parts: one is the research on the statistical machine translation based on the word alignment model, the other is the research based on the phrase alignment model.Our first work is based on the source-channel model which is most widely used about statistical-based machine translation. There are five IBM models. It has been confirmed that IBM Model 4 produces a better alignment quality in comparison with other IBM alignment models. So we developed a prototype system of Chinese-English machine translation based on IBM model 4, which includes the following works: constructing the statistical translation model, building the language model, and implementing a decoder to find the best translations. There are the following works:1) When constructing the Chinese-English statistical translation model, we integrated the pos-tag information and have tested that the improved model has a better alignment quality and better translations than the model with monolingual word clustering.2) One of the most important tasks is to construct a decoder. We have studied the existing search algorithms such as stack search, beam search, greedy search and A~* search algorithm in the statistical machine translation. By comparison we adopted the DP-based beam search and A~* search algorithm. We have tested the A~* search and beam search in Chinese-English machine translation and found that the A~* search algorithm plays better.3) A~* search only extends the best node, which may lead to a very wrong direction in the Chinese-English translation, because the distinctions between English and Chinese are very great. So we introduced the partial breadth search to enlarge the search scope as well as made the heuristic strategy for the selection of the added nodes. The experimental result shows that our method can achieve a better quality and efficiency.4) In Chinese-English machine translation, we had to concerning the bad impacts of the empty word. Because the existing formula of the computing the empty word is not adaptable to Chinese-English machine translation, we had to revise it. Moreover, we set the parameters affecting the translations by experiments.Our second work is based on the first work. Because the word alignment model does not take into account the context in which both the source and the target words appear. And it has many deficiencies; most research on the statistical machine translation is now turning to the research of the phrase-based alignment model. We have done the following works:1) We combined the Viterbi alignment through training based on the IBM model4 with the alignment through ISA algorithm; we have achieved a higher accuracy rate for the word alignment of the training corpus after the test.2) When integrating ISA, we set the formula to compute the MI (Point-wise Mutual Information) as well as the threshold. 3) We present to construct the alignment template using pos tag information so as to take into account the word context, for the word alignment model ignores the word context.4) Because of the high accuracy rate of the word alignment, we extracted the phrase pairs from the training corpus. We used translation memory method when decoding. Through experiments we have found that the efficiency and accuracy of translation have increased.5) We have used the alignment template, which can be involved the context of the sentence.6) How to estimate the translation quality? We used the IBM model4 formula to compute the score of the translations.Above all, we have achieved a more sound translation model through integrating phrase alignment model, getting a better translation quality in contrast to the statistical machine translation based on the word alignment model.
Keywords/Search Tags:statistical machine translation, Chinese- English machine translation, translation model, alignment model, decoder, search algorithm, A~* search algorithm, DP-based algorithm, beam search algorithm, pos tag phrase-based alignment model, ISA
PDF Full Text Request
Related items