As a major part of pattern recognition, Optical Character Recognition(OCR) playsimportant role in areas such as information processing,office automation,post-office systemand bank system。 This paper focus on the study of the probability based extraction method which includingword, text line and text block extraction in our Optical Character Recognition System. Inpre-process of OCR, several methods of the two problems:image binarisation are alsodiscussed, thus corresponding methods are finally selected and determined. What follows isthe outline of the thesis: First, the author introduces the process of our OCR system and makes a brief descriptionof the extraction method.Then he points out that extraction method should rather be based onmathematical model than empirical mode. Second, probability model and its algorithm are presented, with emphasis on the detailedsteps of the algorithm.And the projection method is briefly introduced in order to compare itsexperiment result with that of ours .The comparison of the two methods follows thatprobability based extraction method is better than projection method.The application of ouralgorithm in block extraction is also introduced."Cell Count "method is introdued to computethe probabity appeared in the paper. Third,based on the model we have presented,word extraction method is discussed and theanalysis of its experiment result is made. Finally, we discussed several methods of image binarisation.We describe a binarizationmethod designed specially for OCR of low quality images:Background SurfaceThresholding.This method is robust and produces images with very little noise and consistentstroke width.The statistic of chain-code of stroke countours method is described to deslant theword.
|