Font Size: a A A

Research On Domain Adaptation For Statistical Machine Trans- Lation

Posted on:2016-08-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y G ZhaoFull Text:PDF
GTID:1108330482452165Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the internet era of the 21 th century, the dramatically increasing information and frequent cross-lingual communication have greatly enlarged the demand of language translation. However, traditional human translation cannot meet such demand in both scalability and efficiency aspects. Therefore, people believe that translation can be accomplished by computers automatically, which increases the demand of machine translation technique. Among different machine translation approaches, statistical ma-chine translation (SMT) has already found a widespread application due to its strong learning capability and high efficiency.The basic idea of SMT is to obtain translation knowledge by analyzing large amount of bilingual corpus based on statistical method, which is applied to fulfil the machine translation task. Under the state-of-the-art log-linear model, models (includ-ing language and translation models) are both built from training corpus, weights are tuned on an independent development dataset and the final system performance is e-valuated on some specific test dataset. In current research and application, domain difference may exist between training and development data, development and test da-ta, which show impact the accuracy of model construction and weight tuning of SMT system.This thesis conducts research on domain adaptation for statistical machine trans-lation, which includes following aspects:Firstly, in order to tackle the domain difference between training and test data, this thesis proposes model adaptation that relies on neural network in which the words are represented under low-dimensional dense vectors. This approach overcomes data sparseness and efficiency issues that are encountered by traditional approaches under discrete word representation. Under the direct-decoding framework of neural network model, this thesis considers both implicit and explicit scenarios:1. Model adaptation based on implicit domain information. This thesis proposes to fine-tune neural network language and translation model based on in-domain de-velopment data, so as to make original domain-independent model to be domain-specific, so as to fulfil model adaptation.2. Model adaptation based on explicit domain information. This thesis proposes to add explicit topic domain information to a feedforward neural network language model, so as to help the SMT system to select proper translation candidates dur-ing translation. Secondly, in order to tackle domain mismatch between development and test da-ta, this thesis proposes solutions for weight adaptation under both static and dynamic scenarios:1. Under the scenario that development data is static, this thesis proposes to adjust original weights based on information of test data, which includes cross-entropy language model weight adaptation and transductive minimum error rate training. Above proposed methods both overcome the weight bias issue due to weights rely on development data.2. Under the scenario that development data is dynamic, this thesis proposes a weight adaptation strategy, i.e., development data construction. Based on score vector representation, a subset is selected from original development, which is close to test set and used to learn new weights. This method overcome the lack of quantitative domain similarity measurement in traditional manual selection.Experimental results show that:n model adaptation task, the proposed neural model-based adaptation methods achieve significant improvement; while on weight adaptation task, the proposed methods select well-fit development data for different test data, so as to ensure system performance under various test data.
Keywords/Search Tags:statistical machine translation, domain adaptation, neural network, topic modelling, transductive learning, cross entropy
PDF Full Text Request
Related items