Font Size: a A A

A Study On The Translation Methods Of Chinese - Naxi 's Statistical Machine

Posted on:2016-08-31Degree:MasterType:Thesis
Country:ChinaCandidate:L ChengFull Text:PDF
GTID:2208330470968130Subject:Control engineering
Abstract/Summary:PDF Full Text Request
Recent years, our country had draw more attention to minority languages and Naxi as one of minority languages and the only living ancient hieroglyphics in the world has been dying out with the development of modern civilization. However, Chinese-to-Naxi statistical machine translation has more significant for academic research and practical application for bilingual study. Syntax-based statistical machine translation method is the current mainstream, which has performed great results in long distance ordering. Recently, researchers have put the lexical semantics, topic information, discourse semantics and other knowledge into machine translation system and significantly improve translation accuracy. In this paper, we research and analysis Chinese-Naxi machine translation from two aspects of lexical semantics and topic information:(1) Based sense induction Chinese-to-Naxi statistical machine translation methods. In order to make better use of the source language and target language corpus end to get the lexical semantics knowledge, we first extract the context window for the same word from bilingual word alignment corpus to compose some pseudo-texts, then use a nonparametric Bayesian topic model that automatically learns sense clusters for words according to these pseudo-texts, which are the inferred senses for words. The proposed sense-based translation model enables the decoder to select appropriate translations or source words according to the inferred senses for these words using maximum entropy classifiers, and finally we integrate the sense-based translation model into syntax-based statistical machine translation system. Results show that the proposed model substantially outperforms the baseline.(2) Statistical machine translation methods based on topic information. Firstly, predefine some topics according to the characteristics of the Naxi corpus and determine sentences topics for every parallel sentence in corpus, and then extract bilingual synchronization rules including topics information. In decoding, we first determine the topic for source language sentences, and then calculate the similarity between sentences to be translated and extracted synchronization rules according to topic similarity model and topic sensitivity model, and finally we search the maximal probability of target sentence when all the sub trees are deduced completely.(3) In the template extraction process of traditional phrase tree to string, the topic information is integrated into tree-to-string templates by VSM, and then evaluates the templates probability distribution, and finally selects the appropriate template for every sub tree by using topic constrain in decoding. On the basis of above theory, we achieve the Chinese-Naxi statistical machine translation system based on topic similarity model.
Keywords/Search Tags:word sense induction, SMT, Naxi, similarly model, sensitivity model, maximum entropy model
PDF Full Text Request
Related items