Research On Reording Model Of Tree-to-string Machine Translation

Posted on:2014-02-21

Degree:Master

Type:Thesis

Country:China

Candidate:Y Wang

Full Text:PDF

GTID:2248330395987136

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Reordering problem in machine translation is caused by the non-monotonic mappingbetween languages. This non-monotonic feature exists widely between languages; even withinone language, due to different dialects and different expressions, the non-monotonic problemalso exists. Therefore, the reordering problem becomes one of the key research points inmachine translation.This thesis proposed a solution to reordering in tree-string machine translation. Firstly,Chinese dependency parsing is carried out according to the features of Chinese language;secondly, the bilingual word alignment result is obtained; and finally, a reordering model isgenerated based on the dependency tree of the source language and the bilingual wordalignment result. The reordering model is able to transform the dependency tree of sourcelanguage to the string of target language which has the prefered logical structure. The mainwork of this thesis is as follows:(1) Due to the specific language features and parsing difficulties of Chinese, this thesisproposed an improved MST dependency parser for Chinese. The dependent relation of anytwo arbitrary words can be described by three models, which are the dependency directiondiscrimination model, the head POS recognition model and the maximum spanning treemodel. Then the Eisner algorithm is used to search and generate the dependency trees. Byconverting dependency direction discrimination and head POS recognition into sequencelabeling, condition random fields can be used for modeling. In accordance with theparticularity of head POS recognition, decoding of condition random fields is improved forreducing the scale of search space and for enhancing efficiency.(2) Based on the dependency tree of source language and the bilingual word alignmentresult, reordering rules are automatically extracted from the bilingual corpus and thereliability of rules is evaluated. In the aspect of rule matching, this thesis provides a similaritycaculation method which is based on HowNet and which promotes the caculation into semantic level. The reliability is very useful while resolving matching collisions of rules.The NTCIR-9patent Chinese-English corpus is used for reorder experiment. Theexperimental result shows that, the translation which applies the method in this thesis hasBLEU value of0.2605, which is3percent higher than Moses. And after dependency parsingexperiment on the dataset of CONLL2009ShareTask, the Unlabeled Attachment Score ofmethod in this thesis reaches86.27%which is better than the current mainstream models.

Keywords/Search Tags:

Reordering model, Chinese Dependency Parser, Word Alignment, Machine translation

PDF Full Text Request

Related items

1	Alignment Based Acquisition Of Collocation And Application In Machine Translation
2	Research About The Reordering In Hierarchical Phrase-based SMT
3	Research On The Method Of Chinese-old Bilingual Word Alignment And Dependency Tree Construction
4	Research On Chinese Word Segmentation Strategies For Statistical Machine Translation
5	Research On Bilingual Corpus-Based Machine Translation
6	Research On Mongolian Dependency Parsing Based On The Conversion Of Chinese-Mongolian Dependency Parsing Tree
7	The Research On English-Chinese Name Entity Translation
8	Morphology-Processing In Chinese-Mongolian Statistical Machine Translation
9	Research On Term Automatic Translation Technology In English-Chinese Machine Translation System
10	Research On Chinese-uyghur Word-alignment For Statistical Machine Translation