Research On Discriminative Training Methods For Statistical Machine Translation

Posted on:2014-08-27

Degree:Doctor

Type:Dissertation

Country:China

Candidate:L M Liu

Full Text:PDF

GTID:1268330392472658

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Over the last two decades, statistical machine translation (SMT) has achievedgreat successes; nevertheless, it is still far away from the human being’s requirementsand thus needs further development and improvements. In the current situation, fromthe view of mathematical models, one of potential directions for SMT is the transi-tion from a few features and small models to many features and large models, and thetransformation from linear models to nonlinear models. Under this research direction,this paper starts from the log-linear based translation model which is the most pop-ular model for SMT, and mainly investigates the following contents focusing on thediscriminative training.(1)Forthelog-linearbasedmodelconsistingofafewfeatures,themostsuccessfultuningmethod,MERT,suffersfromalimitationofinstability. Sinceak-besttranslationlist always changes at each optimization step, which means the variant of optimizationobjective defined over a variant of k-best translation list, the shake phenomenon ofoptimized weights incurs and this induces the instability of MERT. This paper employsthe idea of ultraconservative update when designing the optimization objective, andproposesanewtuningmethodcallederrorrateminimizationbasedonultraconservativeupdate. Experiments show that its performance is better than that of MERT.(2)Forthelog-linearmodelconsistingofalargescaleofsparsefeatures, althoughexisting tuning methods can be used to tune such a translation model from the view oftuning efficiency, its performance is limited due to feature sparsity. This paper con-siders two practical techniques, i.e. enlarging a tuning set and L1regularization, andshows these two techniques are not sufficient due to some other reasons. Therefore,it proposes a novel tuning method based on automatic feature grouping to relieve fea-ture sparsity. In order to learn feature group structure efficiently, it also investigatesan online learning method. Experiments show that this tuning method outperforms theexisting tuning methods.(3)Existingtuningmethodsforthelog-linearmodelusuallysufferfromtwoshort-comings. Firstly, theirperformanceishighlydependentonthechoiceofadevelopmentset, but usually the suitable development set is not available and not simple to create,which may potentially lead to an unstable translation performance for testing because of the difference between the development set and a test set. Secondly, they try tooptimize a single weight towards a given development set but this weight cannot leadto consistent results on the sentence level. To overcome these two shortcomings, thispaper proposes a local training method, which tunes many weights, each one for eachtest sentence, and thus is different from these existing methods. The bottleneck of localtraining is its training efficiency, and thus this paper also proposes an efficient incre-mental training method. Please note that according to decision function for testing thelocal training method works like a nonlinear model.(4) When modeling translation phenomenon, the log-linear model has two limita-tions: its features are strictly required to be linear with respect to the objective and thismay induce modeling inadequacy. In addition, it cannot deeply interpret and representits surface features. A potential solution to address these limitations is modeling withneural networks. On one hand, neural networks can go beyond the linear limitation anditactuallycanapproximatearbitrarycontinuousfunctions. Inotherwords, theirmodel-ing will be more adequate. On the other hand, they can represent their surface featuresby using hidden units. However, classical neural networks will be challenged by thedecoding efficiency due to their inherent characteristics when modeling and decodingare considered together. Therefore, this paper proposes a variant neural network calledAdditive Neural Network for machine translation, and investigates an efficient methodfor its discriminative training.

Keywords/Search Tags:

Statistical machine translation, log-linear model, discriminative training, ultraconservative updata, feature grouping, local training, additive neural network

PDF Full Text Request

Related items

1	Study On Several Key Problems In The Training Process Of Phrase-based Statistical Machine Translation
2	Discriminative training and variational decoding in machine translation via novel algorithms for weighted hypergraphs
3	Training Large-Scale Statistical Machine Translation Models On Spark
4	Offline Model Training Method And System For Large-Scale Distributed Statistical Machine Translation
5	Research On Discriminative Techniques Of Feature Extraction And Acoustic Model Training In Continuous Speech Recognition
6	Optimization On Translation Knowledge In Statistical Machine Translation
7	Deep Neural Network Based Acoustic Feature Extraction For LVCSR Systems
8	Research And Implementation Of Contract Translation Based On Neural Machine Translation Model
9	Research On Multi-group Parameter Tuning And Decoding In Statistical Machine Translation
10	Research On Semi-supervised Mongolian-Chinese Neural Machine Translation Based On Cooperative Training