Font Size: a A A

Research On Machine Translation System Combination Based On Confusion Network

Posted on:2011-02-13Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhuFull Text:PDF
GTID:2178360308955509Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
With the development of machine translation, many different kinds of machine translation systems are invented. System combination is a kind of technology to make use of different kinds of machine translation systems to improve translation quality. Recently, system combination has become one of the research hotspot in machine translation. The present system combination methods can be categorized into two major types: sentence level system combination and word level system combination. Sentence level system combination is based on minimum bayes-risk decoding and word level system combination is achieved in form of confusion network. Owing to the advantage of improving translation quality stably and remarkably, word level system combination has become the popular method in system combination. Word level system combination first needs to align many different machine translation system's outputs and then builds a confusion network based on outputs'word alignment. Then the best combination result is extracted from the confusion network. Word alignment quality is vital to word level system combination's effect.The current sentence and word level system combination methods are analyzed and summarized comprehensively in this thesis. Considering the word alignment of the current word level system combination dose not take language information into account, a new kind of word level system combination method which aims to improve the word alignment quality is proposed. And in order to solve the problem of over-reliance on reference sentences and weak reordering ability of system combination based on a single confusion network, multiple confusion networks decoding is proposed. Rescore and minimum bayes-risk decoding methods are used to decode multiple confusion networks. The main contributions of this thesis are listed below:1. Research and realization of sentence level and word level system combinationSentence level system combination is based on minimum bayes-risk decoding and loss function is needed to be defined in this method. Three kinds of loss functions based on BLEU, TER (Translation Error Rate) and WER (Word Error Rate) respectively are defined to test the effect of sentence level system combination. Classical word level system combination uses TER or incremental TER or GIZA++ to align different machine translations. The confusion network is built using the word alignment of different machine translations. Language model, word posterior probability and word penalty are integrated in log-linear model into confusion network decoding. The final combination result is extracted by the beam search algorithm in the confusion network. Experiment results prove that 0.5 BLEU improvement of translation quality can be achieved by sentence level system combination methods and 1.0 BLEU improvement of translation quality can be achieved by word level system combination.2. Research of methods to improve word alignment qualityAs word alignment quality is critical to word level system combination, two methods are used in word aligning to improve word alignment quality. One method adds language information into word alignment and the other method integrates other source of word alignment information. Stem and synonym are the two language information used. The other source of alignment information is got through alignment by agreement and the alignment results of alignment by agreement and GIZA++ are integrated by intersection or union. Language information can relieve the data sparse problem of word alignment. And integrating word alignment by agreement result can improve word alignment precision. Experiment results confirm that better word alignment quality leads to better system combination result.3. Research and realization of improved word level system combinationImproved word level system combination is realized by improved word alignment quality and improved confusion network decoding. Word level system combination is built on improved word alignment quality. Improved confusion network decoding, using multiple confusion networks, is used to solve the shortcomings of single confusion network decoding of over-reliance on reference sentences and limited word reordering ability. Rescoring and minimum bayes-risk decoding methods are used to extract best result in improved confusion network decoding. Experiments show that improved word level system combination methods can improve translation quality of about 0.5 BLEU than the methods are not improved.
Keywords/Search Tags:Machine Translation, System Combination, Word Alignment, Minimum Bayes-risk Decoding, Confusion Network, Rescoring
PDF Full Text Request
Related items