Research On Semantics Analysis-based Domain Adaptation Reinforcement Method For Machine Translation

Posted on:2018-10-20

Degree:Master

Type:Thesis

Country:China

Candidate:L Yao

Full Text:PDF

GTID:2348330542465187

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Domain adaptation for Machine Translation(MT)refers to the ability of translation systems or models to translate different types of text from diverse domains or topics with the same confidence.It is also embodied in the robustness,stability and portability of translation systems or models.However,current machine translation system which is trained with a large scale parallel corpus shows poor performance in specific-domain texts.The reasons are as follows: first,the parallel training data of the translation system contains the translation knowledge and linguistic phenomenon of different domains,which brings a lot of noise for the translation of text in specific domain.Second,when the domain of text changes,the translation system trained with existing parallel corpus can not automatically adapt to the new domain.To solve the above problems,we focus on the domain adaptation for Statistical Machine Translation(SMT)in this paper,and mainly study the following contents:(1)Topic Model based Data Selection for Domain-Specific Machine TranslationWe utilize topic information as a new feature for data selection in this research.In particular,we compute the topic similarity between sentence pair and target domain development set in order to select domain-relevant sentence pairs from a large scale general domain parallel corpus.The selected sentence pairs are then used for training domainspecific machine translation system.However,the length of sentence pairs are too short to analyze its topic effectively.Therefore,we resort to phrase pair topic model that learns topic distribution for each phrase pair.Subsequently,the topic representations of parallel sentence pairs and target domain development set are inferred with the phrase pairs extracted from them.(2)Translation Model Adaptation for SMT based on Semantic SimilarityThe SMT system is trained with a large scale parallel corpus from diverse topics and domains,when the domain of text changes,the quality of translation usually drops dramatically.To solve this problem,we propose a novel translation model adaptation method based on semantic similarity of phrase pairs.The approach firstly constructs the bilingual mapping relation of the word vector in the target domain,and then obtains the semantic knearest neighbors of source language in the target vector space.Based on the distance of knearest neighbors and candidate translation in the general domain space,we compute the domain-depedent translation similarity of phrase pairs.The similarity score is then integrated into the decoder engine as an additional feature to improve the quality of the translation in the target domain.(3)Combining Sentence and Document Information with Neural Network for Translation Model OptimizationIn this work,we are interested in the dynamic adaption of translation system when the origin of test text is unknown.Specifically,we combine sentence and document level information to improve translation selection.For each phrase pair,we use the sentence and document containing the phrase in source language as the context.With the help of neural network,we firstly learn the semantic representation of a phrase pair and its context.And further compute the matching score of phrase pair using a multi-layer perceptron.

Keywords/Search Tags:

Statistical Machine Translation, Domain Adaptation, Topic information, Semantic Similarity, Translation Model Optimization

PDF Full Text Request

Related items

1	Domain Adaptation For Statistical Machine Translation
2	Research On Domain Adaptation In Statistical Machine Translation Based On Clustering
3	Research On Modal Adaptation For Image Description Translation
4	Research On Domain Adaptation For Statistical Machine Trans- Lation
5	Domain Adaptation For Statistical Machine Translation
6	Exploring Method Of Domain Adaptation For Statistical Machine Translation
7	Optimization On Translation Knowledge In Statistical Machine Translation
8	Domain Adaptation For Statistical Machine Translation
9	Research Of Optimization Methods Integration And Translation Rerank For Mongolian-chinese Machine Translation
10	Research On Chinese-mongolian Statistical Machine Translation Method For Limited Domain