Font Size: a A A

Statistical Based Chinese-Cyrillic Mongolian Machine Translation System

Posted on:2013-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:D SuFull Text:PDF
GTID:2268330395966530Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Nowadays computers have become the most important tools in many aspects of human life. Machine translation, the automation of the translation process by computers, has developed at an unprecedented pace and people may even think that with this new technology "old style" human translations would become obsolete. Following it, many developed countries paying their attention in this field of study. Companies like Google, MSN and Yahoo provide translation services on their websites, generating translations based on statistical method. And statistical machine translation has become dominated in machine translation field. Statistical machine translation has evolved from the word-based level to higher levels of abstraction. Currently the best known systems are phrased-based. and recent research has started to explore tree-based systems with syntactical information.The aim of this thesis is to create a Cyrillic Mongolian corpus and build a Chinese-Cyrillic Mongolian statistical machine translation system. At first, this thesis discusses the relative theories of statistical machine translation and the methods and tools which have used on creating Cyrillic Mongolian corpus. And this thesis shows the result of Chinese-Cyrillic Mongolian statistical machine translation system which has been based on phrase-based translation models.We established a development set and test set. and then obtained a certain amount of Chinese and Cyrillic Mongolian parallel corpus through the two approaches of collecting and establishing. Based on this basis, we did some experiments on the Chinese-Cyrillic Mongolian statistical system. In which, the translation model uses the Chinese-Cyrillic Mongolian parallel corpus that has more than60thousand pairs of sentences, After conducting open testing to the training corpus that has400pairs of sentences, we obtained the evaluation result that BLUE and NIST values are0.1489and4.7232on3-gram,0.1381and4.9333on4-gram,0.1194and4.3772on5-gram respectively.
Keywords/Search Tags:Chinese-Cyrillic Mongolian Machine Translation, Cyrillic MongolianCorpus, Phrase, Translation Model, Language Model
PDF Full Text Request
Related items