Font Size: a A A

Research On English-Chinese Name Transliteration

Posted on:2015-11-25Degree:MasterType:Thesis
Country:ChinaCandidate:D D WangFull Text:PDF
GTID:2298330467986324Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Name transliteration refers to translating name from source to target language with the pronunciation differences between source and target language, which plays an important role in multi-language processing tasks, such as machine translation, bilingual corpus alignment and so on. Our research focuses on how to build an effective English-Chinese name transliteration model of English origin and the method of name origin recognition.In view of characteristics of English-Chinese name transliteration and existing problems, a syllabification and phrase table optimization based method was proposed. The name transliteration can be described as syllable-based translation, which was solved through an application of a phrase-based statistical machine translation model. Firstly, after analyzing the existing problem of the current syllabification method, an improved syllabification method was proposed. Secondly, in order to solve the problem that there is noisy information in the phrase table caused by the small scale of training corpus, we put forward three methods, including low-frequency words elimination based method, C-value based and cohesion based method. Experiments showed that the C-value based method can effectively eliminating the noisy information. Meanwhile, through integrating the location feature, we reorder the generated candidates, then the current situation that the transliteration candidates are not correctly generated slightly improved. Last but not the least, a two-stage of syllabification was proposed to reduce the transliteration errors caused by large granularity of syllable partition. The experimental results showed that the performance improved through these four methods mentioned above, and its transliteration accuracy rose from63.08%to67.62%.Name from different origins have different pronunciation systems. Then transliterating name after recognizing its origin can make its transliteration result better. A rule-and-statistics based method was proposed. Firstly, pinyin and Japanese pronunciation rules were used to divide the name into four categories. Secondly, statistic based classifier was implemented to complete the final origin recognition. We chose character and pronunciation based N-gram language model and location feature as features. Experiments were respectively conducted using different combinations of features. Results showed that when classifying the origin of name, the combination of character based4-gram, pronunciation based2-gram and location feature yield a better performance, whose accuracy is about98.39%.
Keywords/Search Tags:Syllabification, Phrase table optimization, Pronunciation rules, Naive Bayes, N-gram language model
PDF Full Text Request
Related items