Font Size: a A A

Exploring Method Of The Construction Of Parallel Corpus For Machine Translation In A Specific Domain

Posted on:2017-02-07Degree:MasterType:Thesis
Country:ChinaCandidate:H ShanFull Text:PDF
GTID:2308330482987210Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Machine translation is a subject which achieves the translation between different natural languages by using computers. Statistical machine translation, which obtains translation knowledge on the base of parallel corpus and can develop efficient and excellent translation systems, is a mostly used method at present. However the developed systems performed poorly when domain changes, because of the lack of translation knowledge in the target domain. Therefore, the construction of large-scale and high-quality of the target domain parallel corpus plays an important role in improving the performance of statistical machine translation system. However, the cost of constructing parallel corpus manually is very high and the quality of constructing parallel corpus through machine translation is very low.To solve the above problems, we focus on the cost-efficient parallel corpus construction method based on the combination of human translation and machine translation. We present two parallel corpus construction methods according to the actual situation of translators in the specific language pairs:(1) When there are not enough translators available for manual translation, we present a parallel corpus construction method based on pivot language, namely taking the third language as a bridge to construct parallel corpus between the target language pairs by using the existing machine translation technology combined with active learning. We describe the construction method based on pivot language, the domain adaptation method based on active learning, the good translation selection method based on automatic translation evaluation, retraining of translation system and evaluation experiments, through a case study in which Japanese-Chinese parallel corpus is to be constructed by taking English as pivot language, with the help of the mature phrase-based statistical machine translation technology. Our experimental results suggest this approach can construct Japanese-Chinese parallel corpus rapidly and improve Japanese-Chinese machine translation performance efficiently.(2) In order to improve the quality of the parallel corpus constructed by machine translation, we present the application of the dependency-to-string model to Japanese-Chinese parallel corpus construction. The model exploits the syntactic and semantic information encoded in dependency tree to build the translation model. At the same time, the method uses the domain adaptation method based on active learning to improve the quality of parallel corpus by improving the performance of the translation system. We put emphasis on the dependency-to-string translation model for Japanese-Chinese statistical machine translation and carrying out the evaluation experiments via a case study of constructing Japanese-Chinese parallel corpus. The experiment results show that the BLEU score and the RIBES score are increased by 0.62 and 0.31 respectively, therefore prove that the model can effectively improve the performance of translation system.In summary, for machine translation system development in a specific domain, the paper presents the two approaches to construct parallel corpora with low cost and high efficiency and proves their effectiveness and feasibility.
Keywords/Search Tags:parallel corpus, statistical machine translation, active learning, pivot language, domain adaptation, dependency-to-string
PDF Full Text Request
Related items