Font Size: a A A

The Research On Hierarchical Phrase-based Statistical Machine Translation With Japanese Case Grammar

Posted on:2016-07-26Degree:MasterType:Thesis
Country:ChinaCandidate:J M LiuFull Text:PDF
GTID:2308330467472803Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Machine Translation (MT) is a research and application subject, which focuses on translation between two different languages with computer. With the development of science, information is expanded now. The need of machine translation is more and more in fields of tourist conversation, product globalization and information retrieval. Nowadays, research on machine translation has achieved good progress, but related application and product does not reach common satisfaction. In the area of corpus-based statistical machine translation, how to efficiently use the linguistic information is still hot problem.In market application, statistical machine translation is still in leading position, where phrase-based model and hierarchical phrase-based model are popular. Besides, syntax-based translation models are hard to be applied into product due to their complexity. However, for research, translation models with more linguistic information and syntax structure have much more potential. So how to use linguistic information and syntax structure to enhance translation models becomes a main problem.Case grammar is a meaningful and mature grammar. Among all the languages, Japanese case structure is a kind of explicit case structure, where case information is marked with different case auxiliary word. So Japanese case frame is more easily analyzed than other languages’. Besides, Japanese case frame has been used in Japanese syntax parser, and in a result, Japanese syntax parser is best among all the other languages’parser. So this paper proposes the method to apply Japanese case frame into hierarchical phrase-based model. This research is the first in history of statistical machine translation. The contribution of this paper is concluded below:(1) Use Japanese case frame to constraint rules extracted from hierarchical phrase-based model. These constrained rules can be described in semantic level and make more senses. The goal is to efficiently utilize the relationship between statistics and linguistics. The experiments show that the number of constrained rules is smaller than that of traditional model.(2) Extract Japanese case frame reordering rules, and use these rules on translation. The goal is to alleviate the frequent use of glue rule in long-distance reordering for hierarchical phrase-based model. At the same time, pay attention to the error from word alignment and give soft constraints during rule extraction. The experiments show the improved translation quality (BLEU score)(3) Propose chunk-based dependency tree to string style translation process in upper level so as to ensure the translation is more reasonable avoiding the influence of virtual word and auxiliary word alignment.Through experimental analysis, it shows rationality and effectiveness for our methods.
Keywords/Search Tags:statistical machine translation, case grammar, hierarchical phrase-basedmodel, rule constraints, rule extraction
PDF Full Text Request
Related items