Font Size: a A A

Implementation And Analysis Of Tree To String Alignment Template Model In Statistical Machine Translation

Posted on:2011-02-27Degree:MasterType:Thesis
Country:ChinaCandidate:C Y ZhangFull Text:PDF
GTID:2178330338979972Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Statistical machine translation is the task of automatically translating a text from one natural language into another by using statistical methods. Currently, linguistic-based translation model has become a dominant issue by more and more statistical MT researchers. Among many existed linguistic models, tree to string alignment template model is a classical representative.In this thesis, firstly we describe in detail tree to string alignment template model, which is directed by linguistic syntax, from formal definition, free parameters estimation to decoding method. We implement a decoder with respect to the model. In order to accelerate the decoding speed, we use the cube-pruning method to prune hypothesises, so time cost of decoding is decreased significantly.Secondly, we compare tree to string alignment template model with phrase model on 3 points as follows. Tree to string alignment template model has better generation ability than phrase model, especially on exploiting non-continuous custom collocation. And tree to string alignment template model can reorder long distance distortion better. Although tree to string alignment template model has many advantages compared with phrase model, it can not express continuous non-syntax phrase. At last we get our decoder's performance on NIST 2005 and NIST 2008 MT evaluation set with Moses as a baseline system.Finally, statistical-based transliteration is discussed on Chinese to English person name. We classify the-state-of-art statistical-based transliteration method, and introduce two transliteration models: sequence label-based transliteration model and noisy channel-based transliteration model. According to sufficient experiments, we get some useful conclusions as follows: in noisy channel-based transliteration model, the basic unit of Chinese should be Chinese character and syllable-English sequence can improve significant performance under the condition of low-order language model. We can get better performance with reranking method.
Keywords/Search Tags:tree to string alignment template model, statistiacl machine translation, decoder, named entity translation, transliteration
PDF Full Text Request
Related items