Font Size: a A A

A Study On Stat.-based Chinese Character Recognition Post-processing

Posted on:2004-12-02Degree:MasterType:Thesis
Country:ChinaCandidate:T PengFull Text:PDF
GTID:2168360122461171Subject:Computer applications
Abstract/Summary:PDF Full Text Request
With the development of the computer and network technology at full speed, it is needed to digitize the large amount of text in daily life on various kinds of medium. In order to raise the efficiency and lighten people's burden, OCR (Optical Character Recognition) technology has appeared. In recent years, Chinese character OCR study had already made heavy progress. A lot of commercialized recognition systems trend market successfully. But the character that Chinese character's structure is complex and change greatly often restrict the discerning rate of the individual character. Only rely on the single character recognition, raise the discerning rate is already very difficult. Based on the individual character recognition, it is needed for us to do post-processing using language knowledge and context relevant information of text.This thesis introduces the research meaning and some methods of Chinese characters recognition post-processing. And adopt stat.-base method to do the post-processing to the single character recognition result. Through counting all the adjoined two words in "People's Daily" text of the whole year 2000 (about 19,300,000 words), get the probabilistic relationship between the Chinese characters. According to Markov language model, use this probabilistic relationship between the Chinese characters into Chinese character post-processing. It can raise the discerning rate of the whole system to a certain extent.
Keywords/Search Tags:Chinese character recognition, Post-processing, Text counting, Probability of the adjoined two words, Markov model
PDF Full Text Request
Related items