Font Size: a A A

Translation Knowledge Acquisition In Corpus-based Machine Translation

Posted on:2015-03-26Degree:MasterType:Thesis
Country:ChinaCandidate:L YinFull Text:PDF
GTID:2268330425489077Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Machine translation is to use computer for translation of text or speech between two different languages, which is an experimental discipline. With the development of Internet technology and machine translation technology, machine translation systems are applied to manual localization, travel conversation and cross-language information retrieval. Although great progress has been made in machine translation, many problems remain unresolved and translation of long sentences is one of them, where translation quality and speed of decoder are far from the requirement of real application. Corpus-based machine translation uses parallel sentence pairs to train translation model and is a mainstream approach. This paper focuses on the translation of long sentences and proposes methods automatically translation knowledge with high quality in the framework of example-based machine translation (EBMT) and statistical machine translation (SMT), which are expressive approaches of corpus-based machine translation.In example-based machine translation (EBMT), translation examples are used as translation knowledge. Since word order is quite different between distinct language pairs, there often exist errors in the translation examples obtained only by using word alignment information. To resolve this problem, we propose to use dependency structure information to constraint the extraction of translation examples in order to reduce errors. In this way long distance collocation information can be extracted and then be used to adjust word order. Additionally, dependency structures can be used as features in decoder to improve translation quality. Based on the method we implement an EBMT system which includes translation example extraction module, translation example retrieval module and translation generation module.In statistical machine translation (SMT), phrase table is used as translation knowledge. Automatically extracted phrase table inevitably contains a large number of errors and redundant phrase pairs, which causes excessive waste of time and space in decoding and affects translation quality. In order to resolve this problem, we propose a method to filter phrase table in which virtual context is introduced to calculate an incremental quantity in score of phrase pair from language model. By considering the maximum and minimum incremental quantity in score from the virtual context, we design a filtering strategy by re-ranking phrase pairs. A phrase table filter is implemented based on the proposed algorithm.This paper presents method to extract translation examples by exploiting dependency structure information and also designs a phrase table filtering algorithm by introducing virtual context for corpus-based machine translation. To verify the methods in a practical application, we conduct evaluations on the one million Chinese-English patent parallel sentence pairs of the international open evaluation NTCIR-9. The experimental results of our EBMT system show that the performance of our system is close to that of "KYOTO" system, which is an EBMT system and achieved the best performance in NTCIR-9. The experimental results of our phrase table filter show that when the size of phrase table was reduced to47%of the original, the translation quality was improved slightly; when the size was reduced to30%of the original, only slight decline occurred in translation quality. The experimental results indicate that this method can effectively filter out the redundant phrase pairs of the phrase table. These evaluation results prove that our proposed methods are effective at automatically acquiring translation knowledge with high quality for corpus-based machine translation.
Keywords/Search Tags:statistical machine translation, example-based machine translation, translation example, dependency structure information, phrase table, virtual context, decoder, translation knowledge acquisition, phrase table filter
PDF Full Text Request
Related items