Font Size: a A A

Research On Semantics Analysis-based Domain Adaptation Reinforcement Method For Machine Translation

Posted on:2018-10-20Degree:MasterType:Thesis
Country:ChinaCandidate:L YaoFull Text:PDF
GTID:2348330542465187Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Domain adaptation for Machine Translation(MT)refers to the ability of translation systems or models to translate different types of text from diverse domains or topics with the same confidence.It is also embodied in the robustness,stability and portability of translation systems or models.However,current machine translation system which is trained with a large scale parallel corpus shows poor performance in specific-domain texts.The reasons are as follows: first,the parallel training data of the translation system contains the translation knowledge and linguistic phenomenon of different domains,which brings a lot of noise for the translation of text in specific domain.Second,when the domain of text changes,the translation system trained with existing parallel corpus can not automatically adapt to the new domain.To solve the above problems,we focus on the domain adaptation for Statistical Machine Translation(SMT)in this paper,and mainly study the following contents:(1)Topic Model based Data Selection for Domain-Specific Machine TranslationWe utilize topic information as a new feature for data selection in this research.In particular,we compute the topic similarity between sentence pair and target domain development set in order to select domain-relevant sentence pairs from a large scale general domain parallel corpus.The selected sentence pairs are then used for training domainspecific machine translation system.However,the length of sentence pairs are too short to analyze its topic effectively.Therefore,we resort to phrase pair topic model that learns topic distribution for each phrase pair.Subsequently,the topic representations of parallel sentence pairs and target domain development set are inferred with the phrase pairs extracted from them.(2)Translation Model Adaptation for SMT based on Semantic SimilarityThe SMT system is trained with a large scale parallel corpus from diverse topics and domains,when the domain of text changes,the quality of translation usually drops dramatically.To solve this problem,we propose a novel translation model adaptation method based on semantic similarity of phrase pairs.The approach firstly constructs the bilingual mapping relation of the word vector in the target domain,and then obtains the semantic knearest neighbors of source language in the target vector space.Based on the distance of knearest neighbors and candidate translation in the general domain space,we compute the domain-depedent translation similarity of phrase pairs.The similarity score is then integrated into the decoder engine as an additional feature to improve the quality of the translation in the target domain.(3)Combining Sentence and Document Information with Neural Network for Translation Model OptimizationIn this work,we are interested in the dynamic adaption of translation system when the origin of test text is unknown.Specifically,we combine sentence and document level information to improve translation selection.For each phrase pair,we use the sentence and document containing the phrase in source language as the context.With the help of neural network,we firstly learn the semantic representation of a phrase pair and its context.And further compute the matching score of phrase pair using a multi-layer perceptron.
Keywords/Search Tags:Statistical Machine Translation, Domain Adaptation, Topic information, Semantic Similarity, Translation Model Optimization
PDF Full Text Request
Related items