Font Size: a A A

Research On Automatic Katakana Translation Technology

Posted on:2011-02-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y L GaoFull Text:PDF
GTID:2178360302988568Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the advancement of technology and extension of international technical exchanges, new technical terms in one language are often introduced to another language, rather than being expressed with new created words. Japanese frequently imports vocabulary from other languages, primarily (but not exclusively) from English. It has a special phonetic alphabet called katakana which is used primarily to write down loanwords. As dictionaries and corpora are only able to cover a certain proportion of language, dictionary-based machine translation system cannot translate all of the katakana, so the problem of out of vocabulary katakana reduced to English needs to be resolved as the key translation problem.This thesis uses statistical machine translation methods to achieve the automatic translation of katakana. A complete katakana phrase translation flow is formed by katakana phrase word segmentation, single katakana word translation, bi-directional integration of translation results, and obtaining of phrase translation results and automatic evaluation of the translation results.The work of this thesis mainly includes:First, different from the previous dictionary-based word segmentation method, we use a machine learning method for katakana phrase word segmentation. We regard the problem of katakana phrase word segmentation as a sequence labeling issue, use the context of letters as features (the size of the letter window we used is 3) and introduced conditioned probabilistic model for word segmentation. The experimental results indicate that the method of katakana phrase word segmentation based on conditioned probabilistic model has a higher word segmentation precision.Second, we employ a phrase-based statistical machine translation model for katakana translation and propose a bi-directional integration translation strategy based on the phrase-based machine translation method. After each katakana is translated by a Japanese-English system to give alternate English words, then we use an English-Japanese system to translate the alternate English words. The results of the two times are integrated to obtain the final translation results. The experimental results show that our method outperforms a phrase-based Japanese-English Katakana translation method and has favorable effect.Third, we use bi-directional integration scoring values and language model as the features to obtain the English phrase results of katakana translation. Decoding is achieved by using the viterbi algorithm. Thus, a complete Katakana phrase translation system is built.In this thesis, we use translation accuracy and an internationally accepted automatic evaluation method to evaluate the translation results of katakana phrases. The evaluation results show that the method we proposed in this thesis can effectively solve the problems of katakana translation.
Keywords/Search Tags:Katakana, Word Segmentation, Bi-directional Integration, Conditional Random Field, Statistical Machine Translation
PDF Full Text Request
Related items