Font Size: a A A

Research On Morphologically Asymmetric Chinese Mongolian Statistical Machine Translation Model Construction Methods

Posted on:2012-12-27Degree:MasterType:Thesis
Country:ChinaCandidate:W LiFull Text:PDF
GTID:2178330338992151Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Nowadays, the mainstream statistical machine translation (SMT) methods are essentially a word-oriented, taking the word surface as a separate entity form. The disadvantage of the method is that it cannot make full use of the morphology information of the morphological rich language. In the morphological asymmetric Chinese-Mongolian SMT, when the target language is highly inflected, we often encounter the problem of exactly selecting the correct inflections. This will result in deepening the translation error at the levels of the syntax, semantics, pragmatics, and other aspects. In addition, the lack of large-scale Chinese-Mongolian parallel corpus may cause the sparse data problem, and the highly inflected language will make this problem much acuter.This paper takes considering of the morphologically asymmetric features of the Chinese-Mongolian SMT and solves the asymmetric Chinese-Mongolian SMT model construction problem from the morphology analysis and morphology integration two perspectives. At first, this paper presents a SMT based morphological segmentation approach combined with minimum constituent-context cost model, aiming at solving Mongolian morphological segmentation problem. Then, the factor translation model, chained machine translation and PageRank reranking methods are proposed to solve the asymmetric Chinese-Mongolian SMT model construction problem. The factor translation model takes stem, suffix as a factor vector in the training process, and complete the translation through multiple translation steps and multiple generational steps. The chained machine translation system uses morphemes as pivot language. At last, PageRank reranking method achieves translation performance improvements by combining output from identical statistical machine translation systems trained on alternative morphological decompositions of the target language.Experiments indicate that comparing with the standard phrases based machine translation model, the proposed statistical machine translation model construction method significantly improves the quality of the translation results.
Keywords/Search Tags:Machine Translation, System Combination, morphological anlysis, Mongolian, factor translation model, PageRank ReRanking
PDF Full Text Request
Related items