Font Size: a A A

Study On Word Alignment Technology And Construction Of Statistical Machine Translation Platform

Posted on:2010-11-01Degree:MasterType:Thesis
Country:ChinaCandidate:T N LiFull Text:PDF
GTID:2178360308978402Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Recently, statistical methods show great power in the field of machine translation (MT). A variety of statistical machine translation (SMT) systems have been studied, such as phrase-based system, hierarchical phrase-based system and syntax-based system. Each system has its own characteristics, and they have shown good performance in different domains.In this paper, our work focuses on constructing a multi-engine based SMT platform. Further, we do study on improving symmetrization methods of word alignment. The work is summarized as follows:(1) Research on re-alignment method for phrase-based translation system.To solve the symmetrization problem of word alignment, we propose an improved method for the symmetrization problem. In this method, we firstly identify the inconsistent parts between the bidirectional word alignments which are generated by IBM models. Then do re-alignment on the inconsistencies parts. To further improve the performance, we propose a method to enhance word alignment by making use of large monolingual corpus. The experimental results show that our method can outperform the conventional method of symmetrization on the SSMT07 Chinese-to-English Translation task.(2) Construction of multi-engine based SMT platform.We construct a multi-engine based SMT platform which is necessary for the research on SMT models and algorithms as well as the real-world applications of SMT techniques. This platform consists of three state-of-the-art SMT engines, including phrase-based SMT system, hierarchical phrase-based SMT system and syntax-based SMT system. In order to make all the SMT engines work well together, we introduce the idea of modularization into the platform construction. We divide our platform into several functional modules, and integrate all the modules into it together. There are six modules:1) the pre-processing module; 2) the word alignment module; 3) the rules and phrase extraction module; 4) the decoder; 5) the system combination module; 6) post-processing module, where modules 3) and 4) vary between different SMT engines. We carry out experiments on different data sets in our systems, and compare the performances of different SMT engines.In summary, our work focuses on construction of multi-engine based SMT platform. Also, we do study on symmetrization problems of word alignment, and propose our improved method.
Keywords/Search Tags:IBM model, word alignment, symmetrization, statistical machine translation, multi-engine translation platform
PDF Full Text Request
Related items