Font Size: a A A

The Research Of Unknown Chinese Work Recognition And Its Application To Chinese Input Method

Posted on:2006-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhouFull Text:PDF
GTID:2178360155967463Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
For the problems of lack of updating unknown Chinese word dynamically in input method word code library (IMWCL) in the Chinese input system, the thesis puts forward a method which gets the Chinese corpus from the Internet and used fragment word segmentation and word combination extraction to recognize the unknown Chinese word. It implements a dynamic update system for word code library in Chinese input system named ZHHZ-UCWRS. Firstly, the thesis introduces the method about how to get the corpus from the news web site and construct the Chinese corpus to extract the unknown Chinese word. Then it focus on discussing the way which extracts the unknown Chinese word from the Chinese corpus, and puts forward the method of fragment word segmentation which gains priority value of each item according to its frequency and length, and uses the greedy algorithm to segment the fragment word to obtain the unknown Chinese word in fragment word. On the basis of the fragment word segmentation, it brings forward the word combination method which builds a bi-gram model and uses mutual information and rule filtration to get the unknown Chinese word from the Chinese corpus. At last, it wipes off the unknown Chinese word which is unfit for the IMWCL through frequency filtration. Then it assigns the input code to the unknown Chinese words in accordance with the code rules in input method, and adds word into the self-defined text of the Chinese input system in conformity to the definite format, and finish the dynamic update for IMWCL. The precise rate of the method which the thesis puts forward to uses fragment word segmentation and word combination to recognize the unknown Chinese word is 81.25% and the recall rate is 82.38%. The method to recognize the unknown Chinese words for the IMWCL which is brought forward in this thesis is a good reference for the study of unknown Chinese words. The system ZHHZ-UCWRS which realizes is able to meet the user's requirements, and has good practical values.
Keywords/Search Tags:Unknown Chinese Word Recognition, Chinese Input Method, Fragment Word Segmentation, Word Combination Extraction
PDF Full Text Request
Related items