Font Size: a A A

Research On Reording Model Of Tree-to-string Machine Translation

Posted on:2014-02-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2248330395987136Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Reordering problem in machine translation is caused by the non-monotonic mappingbetween languages. This non-monotonic feature exists widely between languages; even withinone language, due to different dialects and different expressions, the non-monotonic problemalso exists. Therefore, the reordering problem becomes one of the key research points inmachine translation.This thesis proposed a solution to reordering in tree-string machine translation. Firstly,Chinese dependency parsing is carried out according to the features of Chinese language;secondly, the bilingual word alignment result is obtained; and finally, a reordering model isgenerated based on the dependency tree of the source language and the bilingual wordalignment result. The reordering model is able to transform the dependency tree of sourcelanguage to the string of target language which has the prefered logical structure. The mainwork of this thesis is as follows:(1) Due to the specific language features and parsing difficulties of Chinese, this thesisproposed an improved MST dependency parser for Chinese. The dependent relation of anytwo arbitrary words can be described by three models, which are the dependency directiondiscrimination model, the head POS recognition model and the maximum spanning treemodel. Then the Eisner algorithm is used to search and generate the dependency trees. Byconverting dependency direction discrimination and head POS recognition into sequencelabeling, condition random fields can be used for modeling. In accordance with theparticularity of head POS recognition, decoding of condition random fields is improved forreducing the scale of search space and for enhancing efficiency.(2) Based on the dependency tree of source language and the bilingual word alignmentresult, reordering rules are automatically extracted from the bilingual corpus and thereliability of rules is evaluated. In the aspect of rule matching, this thesis provides a similaritycaculation method which is based on HowNet and which promotes the caculation into semantic level. The reliability is very useful while resolving matching collisions of rules.The NTCIR-9patent Chinese-English corpus is used for reorder experiment. Theexperimental result shows that, the translation which applies the method in this thesis hasBLEU value of0.2605, which is3percent higher than Moses. And after dependency parsingexperiment on the dataset of CONLL2009ShareTask, the Unlabeled Attachment Score ofmethod in this thesis reaches86.27%which is better than the current mainstream models.
Keywords/Search Tags:Reordering model, Chinese Dependency Parser, Word Alignment, Machine translation
PDF Full Text Request
Related items