Font Size: a A A

Domain Adaptation Of Statistic Machine Translation Based On Context Information

Posted on:2016-11-14Degree:MasterType:Thesis
Country:ChinaCandidate:J LiuFull Text:PDF
GTID:2405330542957268Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Statistic machine translation(SMT)is a technology that can transform a source language to another target language through the help of computers.It relies on technologies that have corresponding to natural language process(NLP),artificial intelligence(AI),computational linguistics.Now,machine translation is becoming one of the most important and the most challenging subject in the modem world.The mainstream statistic machine translation system is building on the top of the phrase based translation.The main methods in phrase based translation includes analysing the bilingual corpus,extracting the phrases that carry translation information,and get the translation result through a complex combination,rule pruning,reordering process.Accuracy,efficiency,and elegance are the target of translation.Studying now existing machine translation system,we found that it only uses the context-free grammar to behave translation,without consider the topic of the text to be translated,nor the context in sentence.So traditional statistic machine translation failed in domain adaptation translation.This thesis explore the defeat of statistic machine translation system,and add additional models to it.The model add context information to translation rule,and it can do domain adaptation translation in decoding phrase,and achieve better translation performance.Domain adaptation translation models contains:(1)Use topic information to do translation rule selection to do domain adaptation translation.Same source language part may be translated to different target language part in different domains,and the traditional statistic machine translation doesn’t use this feature.This thesis add topic feature to the translation rule,and use the match degree to build a topic similar model and topic sensitive model to compute topic similarity between translation rule and document to be translated..(2)Use context information to help to do the translation rule selection in sentence level.As we known,the same source phrase may be translated into different target phrase based on the context of the sentence,and it maybe get reordered based on some syntax.This thesis use the maxent classifier to build translation rule selection model and a reorder model,and use this two model to do translation rule selection and phrase reordering.Add this two models to statistic machine translation system can improve translation performance.The result of the experiment shows that adding context information features into the decoding phrase of the statistic machine translation can achieve better translation quality,and get better translation performance.
Keywords/Search Tags:SMT, Domain adaptation, Topic model, Rule selection, Reorder
PDF Full Text Request
Related items