Exploring Method Of Domain Adaptation For Statistical Machine Translation

Posted on:2016-01-30

Degree:Master

Type:Thesis

Country:China

Candidate:C Su

Full Text:PDF

GTID:2298330467472828

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Statistical machine translation, which is based on statistical model, has been suggested to be the state of the art. It can obtain translation knowledge effectively and set up translation system with good performance rapidly. However current statistical machine translation system shows poor performance when domain changes. On the one hand, when machine translation system is used in Chinese English patent translation tasks, domain changing decreases Chinese word segmentation accuracy, which makes extracting correct translation knowledge become difficult. On the other hand, a large number of new words are introduced by new domain. Thus, existing translation knowledge can not recognize them.To solve the problems above, which results from domain changing, we put emphasis on domain adaptation for statistical machine translation, and attempts to improve the accuracy and coverage of extracted translation knowledge. These methods include domain-adaptive Chinese word segmentation for statistical machine translation and paraphrase technology, both of which aim at improving domain adaptation for statistical machine translation. In this thesis, we present our work in two aspects.(1) To solve the domain adaptation problems in Chinese word segmentation, we implement Chinese word segmentation by exploiting n-gram statistical features in raw corpus and bilingually motivated word segmentation information in parallel corpus, respectively. We further propose a linear model based method to combine multiple results, which provides an effective Chinese word segmentation for different domain statistical machine translation. For evaluation, we conduct experiments of Chinese word segmentation and Chinese-English machine translation using the data of NTCIR-10Chinese-English patent translation task. Experimental results show that the integrated method brings improvements both in F-measure of the Chinese word segmentation and in BLEU score of the Chinese-English statistical machine translation system.(2) Extending phrase table helps improve the coverage of unknown words from new domain. However, large-scale and high-quality parallel corpus is rare resource. Thus, we introduce additional paraphrase to statistical machine translation to improve domain adaptation. The idea is that the coverage of phrase table in semantic information is higher than that in phrase phenomenon because of diversity of natural language. Thus, unknown words can be transferred into their paraphrase and get a proper translation from phrase table. In this work, we acquire paraphrase knowledge based on a third language, express multiple paraphrases of input sentence in a lattice and modify statistical machine translation decoding algorithm to process the lattice. Experimental results show that, in different scaled training set, the proposed systems always outperformance traditional system, and the proposed one is robust.In summary, to improve domain adaptation of statistical machine translation, this thesis propose two methods to optimize statistical machine translation, including extracting translation knowledge and decoding with translation knowledge. Experimental results show our method brings performance improvement for statistical machine translation in domain adaptation.

Keywords/Search Tags:

statistical machine translation, domain adaptation, Chinesesegmentation, paraphrase, lattice

PDF Full Text Request

Related items

1	Domain Adaptation For Statistical Machine Translation
2	Research On Semantics Analysis-based Domain Adaptation Reinforcement Method For Machine Translation
3	Domain Adaptation For Statistical Machine Translation
4	Research On Domain Adaptation For Statistical Machine Trans- Lation
5	Domain Adaptation For Statistical Machine Translation
6	Research On Domain Adaptation In Statistical Machine Translation Based On Clustering
7	Research On Specific-domain Monolingual Paraphrase Extraction In Automatic Evaluation Of Machine Translation
8	Exploring Method Of The Construction Of Parallel Corpus For Machine Translation In A Specific Domain
9	Adapting Machine Translation Models For Paraphrase Generation
10	Research On Domain Adaptation Methods For Neural Machine Translation