Font Size: a A A

Morphology-Processing In Chinese-Mongolian Statistical Machine Translation

Posted on:2010-05-27Degree:MasterType:Thesis
Country:ChinaCandidate:P YangFull Text:PDF
GTID:2178360302959714Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Phrase-based statistical machine translation is a hotspot in machine translation research. Currently, such as the conditions of the Chinese-English statistical machine translation are not only the size of the bilingual parallel corpora is adequate, but also the morphological analysis and syntactic analysis of the basic research is also very mature. So that these conditions provide a good platform for their research.However, the above conditions for Chinese-Mongolian statistical machine translation are still scarce. There are two major difficulties in this area: Firstly, the Mongolian information construction is backward relative to Chinese, not only the size of the parallel corpora is smaller, but also the morphological analysis and syntactic analysis of the basic research is also a follow-up, this limits the development of the Chinese-Mongolian statistical machine translation; Secondly, Mongolian belongs to agglutinative language, which has extremely rich morphological changes, and it is very different from the Chinese that is the isolating language. So there are many problems in Chinese-Mongolian statistical machine translation. In particular, the error of word surface forms and the confusion of word orders in the output sentences are outstanding.In this thesis, the research object is Chinese-Mongolian statistical machine translation. According to the feature of the Mongolian which has rich morphological changes, the morphological factors are introduced into the Chinese-Mongolian statistical machine translation and the Mongolian morphological information are full used through factored models; According to the problem that the size of Chinese-Mongolian parallel corpus is very smaller, the word alignment points based on the dictionary method are merged into the IBM word alignment model, and morphological knowledge of the stem are used to improve the word alignment quality based on the dictionary method. Then the merged word alignment points are used to train the phrase translation model for the phrase-based Chinese-Mongolian statistical machine translation and finally the quality of the translation is improved. In statistical machine translation, the translation model is the only irreplaceable model, and its importance is self-evident. According to the redundancy problem of the translation phrase pairs which generate in the training process of the phrase translation model, this thesis designs and implements a general phrase translation model filter based on statistical methods. The filter can reduce the model noises based on different statistical methods and the size of the phrase translation model is effectively reduced, at the same time it has little impact on the quality of the translations for the statistical machine translationEvery research part mentioned has some related experiments, which are used to validate the effectiveness of the proposed methods in this thesis. At the same time, we also discuss the possibility of some methods to continue in-depth study.
Keywords/Search Tags:Statistical Machine Translation, Factored Model, Word Alignment Mergence, Translation Model Filter
PDF Full Text Request
Related items