Font Size: a A A

Research And Implementation Of Hierarchical Phrase-based Translation Model In Statistical Machine Translation

Posted on:2011-11-30Degree:MasterType:Thesis
Country:ChinaCandidate:C XuFull Text:PDF
GTID:2198330338479993Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the increasingly frequent international exchanges and the rapid development of Internet, the communication between different languages is becoming more and more important, promoting the rapid development of machine translation. Over the past decade, statistical machine translation has made great success and become the mainstream method of machine translation. Phrase-based translation models, which go beyond the original word-based models, have been suggested to be the state of the art by recent empirical evaluations. However, one major problem with phrase-based models is their incapability of robust phrase-level reordering. Many scholars begin to introduce linguistic information to the translation model; syntax-based translation model has become a hotspot. In this paper, we make an intensive study on the state of the art SCFG model-hierarchical phrase-base translation model, and guidance through the syntactic information, make translation quality significantly improved.Firstly, we introduce the hierarchical phrase-base translation model which based on synchronous context-free grammar, implement the traning process of hierarcical phrase-base model, which include rule extraction and rule scoring. We prove that the limitation on the rule extraction influence the translation performance greatly by using plenty of experiments; moreover, we implement the decoder of hierarchical phrase-base model, introduce the data structures and efficient algorithms used in the decoder. We prove the superiority of the hierarchical phrase rules by classify them, analysis the intrinsic property of the model by compare with the phrase-based translation model.Secondly, we add linguistic information to the hierarchical phrase-base translation model, introduce relate theory of syntax-based translation model and tree transducers. We first parsing the source language and obtain some information about the translation rules, add these information to the hierarchical phrase-base translation model use a soft constraints method, then we use Minimum Error Rate Training (MERT) to adjust the parameters, and verify the relationship between syntactic information and system performance with a large number of experiments.Finally, we introduce rule constraint model which bases on the maximum entropy. We describe the principle, frameworks and benefits of maximum entropy models. We can get some samples with linguistic information in the training, then we can training the maximum entropy model according to these sample, like other model, we add the maximum entropy model to the log-linear model, and verify the validity of the model with plenty of experiment.
Keywords/Search Tags:hierarcical phrase-base translation model, decoder, parsing, soft constraints, maximum entropy model
PDF Full Text Request
Related items