Font Size: a A A

Statistical Machine Translation Reranking And Application For Specific Field

Posted on:2018-01-11Degree:MasterType:Thesis
Country:ChinaCandidate:X W LiFull Text:PDF
GTID:2348330512997175Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the fast process of globalization and development of multinational corpora-tions,communications among countries become more frequent.The need for transla-tion between different languages becomes more urgent.Statistical machine translation is promising because of its effectiveness and efficiency.However,current statistical machine translation system still needs to be improved in many aspects.Researchers proposed reranking methods to improve performance of statistical machine translation in the general field.But,reranking methods can to be improved further.When statisti-cal machine translation system is applied to specific field,such as commodity-related areas,certain terms should be translated correctly and accurately even with limited training data.And the process of training a system using in-domain data should be more friendly for individuals who are not expert.A clear and easy training process can make statistical machine translation system more available.So,we do some work from the following three aspects.Firstly,we enhance reranking methods for statistical machine translation.This part consists of two steps:1)We introduce convolution neural network to enhance rank-ing methods.We use convolution neural network to do better comparison by learning more combinations of coordinate features of two instances.We create a series of rank-ing architectures and we do a series of experiments using each architecture to examine performances of them.Because we find that current reranking methods only use one certain combination of coordinate features in two instances.While other ranking meth-ods fail to compare two instances from atomic feature.2)We do experiments on other tasks to prove versatility of our architectures.For example,we use data from infor-mation retrieval and identifying Chinese coordinate structures.Results show that our architectures have steady and good performances.Secondly,we explore many ways to get better performance on commodity-related data.Although we have limited in-domain data,accurate translations are required for terms from commodity field.We believe more in-domain and out-domain data can lead to better translations of in-domain and out-domain data,respectively.However,too much out-domain data will slow down the training process.So,we should no-tice the proportion of them for a efficient training process.We also notice that many commodity-related terms consist many words.Experiments show that those terms can get accurate translations with or without word segmentation.Thirdly,we visualize the whole process of training a phrase-based statistical ma-chine translation system.Visualization makes the training process easier.When ap-plied to specific field,people should train a machine translation system using their data for better translations of terms in that field.Visualization makes it easier for non-professionals to configure parameters and run scripts for training.We split the process into several steps then show related parameters in coordinate web pages.Parameters are automatically detected and showed in drop-down boxes for users to choose.Each page can also show if this step is correct when it finishes.
Keywords/Search Tags:Statistical Machine Traslation, Ranking Methods, Convolution Neural Net-work, Visualization
PDF Full Text Request
Related items