Font Size: a A A

Continuous-Space Based Statistical Machine Translation

Posted on:2017-01-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:R WangFull Text:PDF
GTID:1368330590990816Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Natural language,or human language,is understood as a cultural specific communication system in informal usage [1].Natural Language Processing(NLP,which is also called computational linguistics)employs computational techniques for the purpose of learning,understanding,and producing human language content [2].Statistical Machine Translation(SMT),which is one of the most popular aspects of NLP,is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora.It contrasts with the rule-based machine translation approaches.Phrase-based SMT is widely considered as state-of-the-art system.Recently,continuous-space methods,especially Neural Network(NN)based methods become popular,as the upgrading of computer performance.NN methods are used to improve translation model,language model or directly integrated into end-to-end SMT systems.Although continuous-space methods have been shown helpful in many SMT tasks,they also suffer some shortcomings:1)Non-linear algorithms are applied to continuous-space based methods,which ensure high performances.Meanwhile,this will make the training and decoding speed much slower.2)Since most of continuous-space based methods,especially NN based methods,learn features automatically.However some useful features,such as semantic information,are missing.In this thesis,we focus on continuous-space SMT in the following two aspects: NN based methods and graph based methods.For NN based methods,1)we propose a continuous-space language model conversion method,which can accelerate the decoding speed of NN language model as well as maintain its accuracy.2)Making use of the generation abilities of NN,we combine the connecting phrase methods and NN methods together,to enhance SMT adaptation and generation.For graph based method,we propose a novel bilingual sense unit Bilingual Contexonym Cliques(BCCs).BCC can describe senses of a word better,in comparison with simple document or sliding window information.Bilingual Graph-based Semantic Model(BGSM)is capable of effectively modeling word sense representation instead of word itself.The proposed model is applied to phrase pair translation probability estimation and generation for SMT.We have empirically evaluated the proposed methods on IWSLT and NIST data sets,and compared with the existing state-of-the-art methods.Empirical results shows that the proposed methods outperform the existing methods in both performance and efficiency.
Keywords/Search Tags:Statistical Machine Translation, Continuous-Space Models, Neural Network Models, Bilingual Graph Semantic Models, Language Models
PDF Full Text Request
Related items