Font Size: a A A

Research On The Transliteration From Pinyin To Chinese Characters And Normalization For Chinese Sentences

Posted on:1999-03-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:C Q ZongFull Text:PDF
GTID:1118360185495585Subject:Computer applications
Abstract/Summary:PDF Full Text Request
The transliteration from Pinyin to Chinese characters and normalization for Chinese sentences are two hard nuts to crack in the research on Chinese information processing. This dissertation is set in the Chinese-to-English speech translation system, and focuses on further research on the two puzzles.In a system of Chinese-to-English speech translation, the transliteration from Pinyin to Chinese characters and normalization processing for the Chinese sentences are critical links between acoustic signal recognizer and machine translator. Research on these problems is not only of significance for the implementation of speech-to-speech translation system, but also has important theoretical meaning and practical value for the research on human-computer speech communication and natural language interface.In the research on transliteration from Pinyin to Chinese characters, there are two main implementation techniques — analysis method based on linguistic knowledge and statistical method based on corpus. Having analyzed these two techniques, the author presents a trial and backtracking (TB) model, and gives ideas that the segmentation of Pinyin sequence and the recognition of homophone words are processed integrally. A Pinyin-Hanzi transliteration (PHT) algorithm based on TB model has been designed and implemented.The PHT algorithm considers the checking results of context of candidate homophone words as the heuristic information to segment syllable sequence. The blindness of segmentation for syllable sequence is thus avoided, and the correct ratio of segmentation is raised. The TB model and the ideas for recognizing homophone words are significant in finding and cutting the invalid paths of problem resolving timely, and also in reducing the combination explosion caused by numbers of homophone words.An intelligent transliteration and processing (ITP) system based on TB model is developed by the author. In ITP system, the recognition method of homophone words based on multiple knowledge sources is proposed for the...
Keywords/Search Tags:Transliteration
PDF Full Text Request
Related items