Font Size: a A A

Transliteration Of Cambodian And Chinese Names Based On Nonparametric Bayesian Model

Posted on:2018-09-15Degree:MasterType:Thesis
Country:ChinaCandidate:Q L LeiFull Text:PDF
GTID:2358330518460490Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In Natural Language Processing fields,name transliteration is an significant task,which has important applications in the field of cross language information retrieval,Machine Translation and so on.Compared with other languages,due to the lack of corpus size and basic research,it is still in the preliminary stage in the study of Khmer-Chinese name transliteration.This paper focuses on the study of Khmer-Chinese name transliteration.The main work of this paper is summarized as follows:1.Khmer-Chinese name transliteration based on non-parametric bayesian algorithm and conditional random fields.This chapter presents the non-parametric bayesian algorithm and CRFs in Khmer-Chinese name transliteration method,realizing the Khmer name syllable segmentation algorithm based on the Dirichlet process theory,the person name in Khmer become Khmer syllable after a syllable segmentation algorithm,the conditional random fields to construct Khmer-Chinese name transliteration model,the accuracy of Khmer-Chinese name transliteration is 46.5%.2.Khmer-Chinese name transliteration based on hierarchical Dirichlet process.In this chapter,we propose a method based on hierarchical Dirichlet process to realize the multiple to multiple alignment of Khmer name to Chinese name.The hierarchical Dirichlet process of Khmer-Chinese name transliteration based on the hierarchical Dirichlet process theory,hence realize the Khmer-Chinese bilingual name syllable alignment algorithm,align the Khmer-Chinese names from the Internet with syllable alignment algorithm,the aligned corpus as the training corpus,the construction of Khmer-Chinese name transliteration model by Moses,test in Khmer-Chinese name transliteration model using the test corpus,the accuracy is 51.6%,the recall rate is 47.5%,F-measure is 49.47%.3.The construction of Khmer-Chinese name transliteration system based on hierarchical Dirichlet process.The paper applies the Khmer-Chinese name transliteration method based on the hierarchical Dirichlet process into the Khmer-Chinese name transliteration system,and uses the open source Web framework to build the online Khmer-Chinese name transliteration system.
Keywords/Search Tags:Khmer-Chinese, Dirichlet Process, Hierarchical Dirichlet Processes, Transliteration Model
PDF Full Text Request
Related items