Font Size: a A A

Research On Discriminative Training Methods For Statistical Machine Translation

Posted on:2014-08-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:L M LiuFull Text:PDF
GTID:1268330392472658Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Over the last two decades, statistical machine translation (SMT) has achievedgreat successes; nevertheless, it is still far away from the human being’s requirementsand thus needs further development and improvements. In the current situation, fromthe view of mathematical models, one of potential directions for SMT is the transi-tion from a few features and small models to many features and large models, and thetransformation from linear models to nonlinear models. Under this research direction,this paper starts from the log-linear based translation model which is the most pop-ular model for SMT, and mainly investigates the following contents focusing on thediscriminative training.(1)Forthelog-linearbasedmodelconsistingofafewfeatures,themostsuccessfultuningmethod,MERT,suffersfromalimitationofinstability. Sinceak-besttranslationlist always changes at each optimization step, which means the variant of optimizationobjective defined over a variant of k-best translation list, the shake phenomenon ofoptimized weights incurs and this induces the instability of MERT. This paper employsthe idea of ultraconservative update when designing the optimization objective, andproposesanewtuningmethodcallederrorrateminimizationbasedonultraconservativeupdate. Experiments show that its performance is better than that of MERT.(2)Forthelog-linearmodelconsistingofalargescaleofsparsefeatures, althoughexisting tuning methods can be used to tune such a translation model from the view oftuning efficiency, its performance is limited due to feature sparsity. This paper con-siders two practical techniques, i.e. enlarging a tuning set and L1regularization, andshows these two techniques are not sufficient due to some other reasons. Therefore,it proposes a novel tuning method based on automatic feature grouping to relieve fea-ture sparsity. In order to learn feature group structure efficiently, it also investigatesan online learning method. Experiments show that this tuning method outperforms theexisting tuning methods.(3)Existingtuningmethodsforthelog-linearmodelusuallysufferfromtwoshort-comings. Firstly, theirperformanceishighlydependentonthechoiceofadevelopmentset, but usually the suitable development set is not available and not simple to create,which may potentially lead to an unstable translation performance for testing because of the difference between the development set and a test set. Secondly, they try tooptimize a single weight towards a given development set but this weight cannot leadto consistent results on the sentence level. To overcome these two shortcomings, thispaper proposes a local training method, which tunes many weights, each one for eachtest sentence, and thus is different from these existing methods. The bottleneck of localtraining is its training efficiency, and thus this paper also proposes an efficient incre-mental training method. Please note that according to decision function for testing thelocal training method works like a nonlinear model.(4) When modeling translation phenomenon, the log-linear model has two limita-tions: its features are strictly required to be linear with respect to the objective and thismay induce modeling inadequacy. In addition, it cannot deeply interpret and representits surface features. A potential solution to address these limitations is modeling withneural networks. On one hand, neural networks can go beyond the linear limitation anditactuallycanapproximatearbitrarycontinuousfunctions. Inotherwords, theirmodel-ing will be more adequate. On the other hand, they can represent their surface featuresby using hidden units. However, classical neural networks will be challenged by thedecoding efficiency due to their inherent characteristics when modeling and decodingare considered together. Therefore, this paper proposes a variant neural network calledAdditive Neural Network for machine translation, and investigates an efficient methodfor its discriminative training.
Keywords/Search Tags:Statistical machine translation, log-linear model, discriminative training, ultraconservative updata, feature grouping, local training, additive neural network
PDF Full Text Request
Related items