Font Size: a A A

English Scramble The Text Recognition System Design And Implementation

Posted on:2008-11-13Degree:MasterType:Thesis
Country:ChinaCandidate:Z LiFull Text:PDF
GTID:2208360212999920Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the fast development of the international technology interacting, multi-language document are becoming more and more popular. It brings a new research topic in Document Recognition: the recognition of multi-language document. In China, Chinese and English mixed documents are very common. The difference between different languages requires the classification of the characters in the document, and recognize with different methods.Based on researching the current OCR systems and related technologies, the paper presents the Recognition of Chinese/English mixed Character system. The works are as following:Firstly, for improving the quality of preprocessing and overcoming the shortcomings of regular character segmentation methods, this paper introduces a novel approach for Chinese/English mixed characters segmentation which based on periods and recognition. By employ a new line-segmentation algorithm, the approach provides a more precise line-segmentation result. By using a new character classification arithmetic and a new Chinese character component union arithmetic, it produce a better segmentation result for Chinese/English mixed .character.Secondly, this paper introduces an efficient architecture for character recognition. It provides a portable, extendable platform for the developer and user. Based on this platform, users can change the work flow of recognition dynamically for better recognition rate. For the maintainer of the system, the work became more convenient.Finally, through in-depth study of key steps in recognition process, we conducted an analysis and comparison of various algorithms, and the advantages and disadvantage of each sceneIn one word, it achieves a recognition rate of 95% and a speed of 6s for one hundred Chinese characters using the recognition system in pure Chinese character context based on above algorithms. The English character recognition rate is more than 85% in Chinese/English mixed character environment.
Keywords/Search Tags:Chinese character recognition, character segmentation, mixed characters, feature extraction
PDF Full Text Request
Related items