Font Size: a A A

Research On Improvement Of Grapheme-Based English Chinese Machine Transliteration

Posted on:2008-08-09Degree:MasterType:Thesis
Country:ChinaCandidate:L WangFull Text:PDF
GTID:2178360245998010Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Machine transliteration is a process of replacing proper nouns in the source language with their approximate phonetic or spelling equivalents in the target language by machine. Comparing with machine translation, transliteration dose not consist semantic translation, also it follows sequence rule. So transliteration is a weaker translation and it has important theoretical sense and practical value for cross-language applications. The growing trend of globalization demands effective and efficient worldwide information access across language barriers, thus, machine transliteration has attracted more and more attention.Transliteration faces severe chanllenge between two languages using different alphabets such as English/Chinese. Because these languages have very different sound and writing systems the transliteration process is complicate and has many factors which cause decendence of performance. This thesis investigates all the 16 papers about transliteration in important NLP conferences. Then we have the system and related researches of English/Chinese machine transliteration by grapheme-based approach. First we explore the impact of increasing corpus scale on transliteration performance. Then we integrate the EM algorithm with discriminative training, which is called EMD algorithm. EMD is about to increase the accuracy of transliteration unit alignment. Also we study the semi-supervised learning. We explore the impact of different labled data on discriminative model. In all, this thesis addresses ways to improve the English/Chinese machine transliteration system.In detail, this thesis has conducted the following research:1. This thesis explore the impact of increasing corpus scale on transliteration performance. We apply grapheme-based approach to English/Chinese machine transliteration. Under the grapheme-based machine transliteration approach, this thesis investigates the NCM(noisy-channel model) and JSCM(joint-souce channel model) in modeling the orthographical contextual information and the orthographical mapping. We use different scale corpus(37.668 and 60,000) to test the performances of transliteration. 2. This thesis applies EM algorithm and EMD training algorithm to map between transliteration units. And this thesis compares the impact of EM and EMD results on the performance of machine transliteration. Tests show that the alignment result of EMD algorithm achieves better transliteration performance.3. This thesis introduces the semi-supervised machine learning algorithm and its application on machine transliteration. In this thesis we use labled data getting from different strategies and we discuss the influences. We carry on open and close test on machine transliteration unit alignment. Tests show that semi-supervised machine learning is helping to improve the performance of transliteration.
Keywords/Search Tags:machine transliteration, EM, EMD, semi-supervised machine learning
PDF Full Text Request
Related items