Font Size: a A A

The Key Technology Research And Implementation Of The Pinyin-to-character Convertion System

Posted on:2016-07-28Degree:MasterType:Thesis
Country:ChinaCandidate:S H LvFull Text:PDF
GTID:2308330473955814Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
The convertion of Pinyin-to-character is a technology that the computer convert a string of Pinyin to their corresponding stream of Characters automatically. It is an important but challenging technology worth researching within the field of processing information in chinese, and widely applicable to the areas of speech recognition and inputing Characters by the way of Pinyin.The key techniques of the converting system of Pinyin-to-character includes language model, segmentation of Pinyin strings and decoding algorithm. In this thesis, firstly, the problem of zero probability in the training process of language model is analyzed and three methods are applied to achieve the data smoothing.The performances of the three algorithms are tested and the results of experiment show that the entropy values of language models trained by these three smoothing methods are between 5-7. Secondly, in order to solve the problem of Chinese long distance dependence, an improved Chinese Frequent String(CFS) extraction algorithmth is puts forward to overcome the shortages of the CFS with less layers and eliminating meaningless CFS words. We use the improved method and the traditional method to extract CFS for training language models, and apply them to the experiment of pinyin-to-character conversion. The results of experiment show that the conversion accuracy of our method is higher than the another.At last, the jieba segmentation tool is modified to make it better in segmentation on the ground that the tool neglects the relation between Characters. Moreover, a data base of knowledge is brought in because Pinyin string segmentation based on rule fails to deal with ambiguity in segmentation and it improves the accuracy rate by 0.9%.In this thesis, a system of Pinyin-to-character convertion is designed and it has a learning part to the system which betters users’ experience by recording and absorbing their imputing habits. the convertion accuracy rate comes to 90.3% by utilizing Viterbi algorithm.
Keywords/Search Tags:Pinyin-to-character conversion, language model, Pinyin string segmentation, Chinese Frequent String(CFS)
PDF Full Text Request
Related items