Font Size: a A A

A Study On Tree To String Based Mongolian And Chinese Statistical Machine Translation

Posted on:2017-09-11Degree:MasterType:Thesis
Country:ChinaCandidate:J NingFull Text:PDF
GTID:2348330485485711Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The current phrase-based statistical machine translation model is the main stream of the Mongolian and Chinese machine translation. However, despite the method is mature, phrase-based statistical machine translation has some inherent defects, such as poor generalization, poor ability of long-distance adjustment of word order, incompetence of representation of translation for the discontinuous phrases and the inconformity with the syntax of the output sentence. These deficiencies limit the further development of this method, therefore, introducing syntactic structure information into machine translation systems has become a new trend. As a translation model, syntax-based statistical machine translation has also been researched a lot. Moreover, some of the latest syntax-based statistical machine translation systems have relatively better performance compared with phrase-based system.In the implementation process of the system we also need a high accuracy Mongolian syntax parser, which is of high value in reality. Statistical syntactic based parsing has been a research focus areas in a long time. In recent years, researchers have achieved some results in the Mongolian statistical syntactic analysis. But compared with the statistical parsing research in English, Chinese and other languages, there is still a certain distance. Research on Mongolian Statistical Parsing still focus on the probabilistic context-free grammar based parsing model. Parsing research on English, Chinese and other languages proves that adding lexical information can enhance the parsing accuracy.This paper mainly do research work in three areas. At first, this paper study research related to researchers at home and abroad, and then achieve the probabilistic context-free grammar based parsing system with the open source tools Stanford Parser and realize the unlexicalized probabilistic context-free grammar based parsing system; after that, design and implement the tree to string based Mongolian and Chinese statistical machine translation system; finally, conduct experiments on Mongolian and Chinese statistical machine translation system and do the evaluation. Mongolian parsing experiments results show that precision and recall rates in the unlexiclized PCFG based parsing experiments is 0.7701 and 0.7707, which is higher than that in vanilla PCFG based experiments. Mongolian and Chinese machine translation results show that BLEU and NIST value in Tree-to-String based Mongolian and Chinese machine translation experiments is almost the same as that in phrase based machine translation experiments. This shows that if we can further improve the Mongolian parsing accuracy, accuracy of tree-to-string based Mongolian and Chinese machine translation system can be further improved.
Keywords/Search Tags:Mongolian, Parsing, Unlexicalized, Tree to String, Machine Translation
PDF Full Text Request
Related items