Font Size: a A A

A Study On Statistical And Rule-Based Combined Mongolian-Chinese Machine Translation

Posted on:2018-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:D M Q E WuFull Text:PDF
GTID:2348330512496461Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Machine translation is a research project of great significance and application value.From the implementation point it can be divided into rule-based,statistical and example-based machine translation,each method has its own advantages and disadvantages.The rule-based machine trans lation method has a advantage of high accurate description of corresponding relations between two kinds of languages and do not rely on a large number of bilingual corpus,but it hard to cover all the language phenomenon.The statistical machine trans lation go through phrase-based,syntax-based and neural machine trans lation method,has developed to a new stage.However,it lack of in-depth exploration of linguistic knowledge.The statistical and example-based machine trans lation both requires a large-scale bilingual parallel corpus as the basis,and considering the Mongolian-Chinese bilingual corpus is limited at the present stage,so the combination of statistics and rule-based machine translation is a research of great value.This paper mainly has done the following research work.First,we trained the probabilistic context-free grammar from Manual annotated Treebank,then build a probabilistic context-free grammar based Mongolian syntax analysis system using open source toolkit NLTK.On this basis,a rule-based method is used to identify some basic phrases in advance as a pretreatment of syntactic analysis,then refine the probabilistic context-free grammar rules using Mongolian case.We validated that this method can improve the accuracy of syntactic analys is through several experiments.Second,we build a Mongolian-Chinese transformation and generation rulebank and a Mongolian-Chinese phrase dictionary.We analyzed the grammatical comparison of Mongolian and Chinese from the perspective of linguistics to get the transformation and generation rulebank which is including 25 kinds of rules.We aligned the words using GIZA++ on a sentence aligned Mongolian-Chinese parallel corpus,and get the word alignment table,then improved that by semi-automatic way to get the phrase dictionary(1.5 million pairs).Third,design and implement the statistics and rule-based combined Mongolian-Chinese machine translation system.The Mongolian analys is is carried out by statistical method,the conversion method is generated by the rule-based method,a rule-based method is used to translate Mongolian numerals automatically as a pretreatment of trans lation.Then we did several groups of experiments to test the performance of translation system.Compared with the phrase-based statistical machine translation system,the experimental results show that the performance of our system is slightly worse than statistical machine translation in general,but for some of the specific structure of the sentence,the translation result of our system is obviously better than the statistical method.If we can improve the accuracy of the syntax analysis,the performance of Statistical and Rule-Based combined Mongolian-Chinese machine translation system also can be further improved.
Keywords/Search Tags:Mongolian-Chinese machine translation, transformation-generation rules, Mongolian numerals, Mongolian case
PDF Full Text Request
Related items