Font Size: a A A

Mongolian Lexical Analysis Research And Its Application In Statistical Machine Translation

Posted on:2016-12-09Degree:MasterType:Thesis
Country:ChinaCandidate:X YuFull Text:PDF
GTID:2308330464466291Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Currently, statistical method is the most popular among machine translation methods. This method requires a lot of bilingual parallel corpus. But in the traditional Mongolian and Chinese statistical machine translation, huge difference between Mongolian and Chinese of language family, word order, structure, morphology and other aspects led to data sparse, improper alignment and word order confusions in translation in the Mongolian and Chinese statistical machine translation. To solve these problems, we should not only blindly expand the size of the training corpus, but also to analysis the word from corpus.Lexical analysis is a basic research in Natural Language Processing. Its accuracy has a direct effect on parsing, semantic parsing, machine translation, information retrieval, automatic indexing, information extraction, and many other studies. So lexical analysis is an important integral module of Natural Language Processing.In this dissertation, in order to improve the quality of Mongolian and Chinese statistical machine translation, we integrated the Mongolian morphological information into the machine translation on the basis of Mongolian lexical analysis. First of all, we adopted Phrase Based Statistical Machine Translation to perform POS tagging for Mongolian vocabulary word, using POS tagging system based on Hidden Markov Models for tagging unknown words. Then, we segmented Mongolian morphological by Phrase Based Statistical Machine Translation Method and we segmented the unknown word by the method which is based on the suffix dictionary. Last, we integrated factors like Mongolian stems, suffix, part-of-speech into the Mongolian-Chinese machine translation, and then did some groups of experiments. The experimental results show that the factor translation model with Mongolian morphological information system really improved the quality of translation system. It is proved that Lexical analysis plays an important role in the Mongolian-Chinese statistical machine translation.
Keywords/Search Tags:part-of-speech tagging, morphological segmentation, Mongolian-Chinese machine translation based on phrase, factor translation model
PDF Full Text Request
Related items