Font Size: a A A

Research On System Combination In Machine Translation

Posted on:2012-10-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y P LiuFull Text:PDF
GTID:1118330362462090Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Machine translation research at least can date back to a few decades. Currently, the dominant research direction of machine translation is statistical based approaches. The statistical based machine translation have evolved from the classic word-based models, the fairly mature phrase-based models, to formal grammar-based and syntax-based models, and then added more features. Thus, SMT come into a new age of"a hundred flowers blooming and a hundred schools contending". However, the different models and decoding style have the ability of different reordering and search space.Facing the translation of multiple translation model, the task of system combination need combine these translations for learning from others' strong points and close the gap. Although system combination make significant performance in recent years, there are many open issues need to be addressed. According to combination grain, system combination includes sentence-based, phrase-based and word-based system. According to combination stage, system combination includes post-processing and decoding system. The major research of the thesis is from two points, which are word-based and post-processing system combination. System combination research has many keys which are skeleton selection, alignment method, reordering, decoding style etc.To improve the performance of system combination, the thesis has several research aspects, which are enhanced alignment methods, reordering model and new training and decoding framework in word-based system and decoding styles in model combination. Specially, the major contribution of this thesis is as follow:1. The proposed framework is based on word-level combination with incremental strategy. In order to explain the effective of the method, we use different incremental order in Translation Error Rate (TER) alignment metrics, and experimental result yield the improvement. The order of hypothesis influences the alignment quality. Moreover, to solve the drawback of exact match in TER, we improve the alignment between candidate and hypothesis translation through stemming and WordNet-based WSD.2. During the training procedure of system combination, the skeleton selection of confusion network decides the order of the hypothesis. The traditional system combination selects hypothesis with MBR as the skeleton, which cause the single order of confusion network. The input hypothesis from various models used by system combination has different reordering. To use various reordering in multiple translation system, we construct a super confusion network, which are multiple confusion network added new confusion-network -based and consensus-based feature, by classic monolingual alignment methods. Then, the experimental results verify the effective in two methods.3. We investigate hypergraph-based training and decoding in system combination. At training stage, we first introduce the second-order semiring for gradient computation. At decoding stage, we solve the limited search space of N-best decoding. The thesis first presents hypergraph-based decoding method which uses Cube Growing instead of Cube Pruning algorithm. The algorithm makes better performance mostly because of the larger space and the better integration in language model feature. Then, we re-rank hypergrpah by the n-gram feature for solving spurious ambuguity and consensus decoding. Finally, we mix two system combiantion models.4. Because of the different expression of every grammar, to compensate for drawback, we combine the two types into a unified framework including the hierarchical phrase-based and bracketing transduction grammar. The decoding framework in model combination, which doesn't re-training and re-decoding stage, is different from system combination. Meanwhile, the performance of model combination is better than individual model.In brief, basiclly, a complete set of solution methods in system combination has been established, especially the training and decoding methods which explored the hard problem in NLP––system combination on the new research way.
Keywords/Search Tags:statistical machine translation, system combination, incremental alignment, super confusion network, hypergraph decoding
PDF Full Text Request
Related items