Font Size: a A A

Parallel text mapping of web-based bilingual corpus materials

Posted on:2010-09-09Degree:Ph.DType:Thesis
University:Carleton University (Canada)Candidate:Zhu, QiboFull Text:PDF
GTID:2448390002486386Subject:Artificial Intelligence
Abstract/Summary:
The chief objective of this thesis is to design and develop a Bitext Mapping Intelligent Agent (BMIA), a computational model that can be used to pair and compare translations texts. There are two main components in BMIA. The first component is the StatCan Daily Translation Extraction System (SDTES) which automatically extracts translations from web-based materials to construct the StatCan Daily Corpus (SDC). At the same time, a translation concordance system (TransConcord) has been developed to provide ready access to SDC and other bilingual corpora. The second component of BMIA is the StatCan Bilingual Text Comparison System (TextComp) that aims at aligning and comparing bilingual texts for translation discrepancy detection and Translation Correspondence Profiling (TCPro). To deal with potentially noisier data sets in the translation checking process, different text mapping algorithms have been designed to parse the input texts, align them, and scan through them to detect translation discrepancies. In order to give a more detailed picture of translation correspondences, TextComp maps translations at a more fine-grained level: the translation constituent level. A TCPro scaling metric is designed to compute the TCPro score for each aligned segment pair so that levels of translation correspondence can be estimated and distinguished. This scale-based view can help in identifying correspondence deviations and objectively assessing the faithfulness of translations. The two component systems in BMIA not only support human translators, but also shed light on machine translation, translation studies, and translation quality assessment.
Keywords/Search Tags:BMIA, Translation, Text, Mapping, Bilingual
Related items