Font Size: a A A

Image Processing And Character Recognition For Special Class Documents

Posted on:2018-10-09Degree:MasterType:Thesis
Country:ChinaCandidate:X HuangFull Text:PDF
GTID:2348330512467086Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
OCR technology can input the text information in papers into the computer in high speed,the researches for OCR greatly promotes the development of image processing and pattern recognition.The birth of OCR effectively solves the contradiction between the efficiency of information input and information processing,improves the whole efficiency of the computer system,and saves the unnecessary labor force.Now with the accelerated development of informatization processes,OCR technology has become the first choice of non keyboard input technology and it provides convenient help for all walks of life.OCR technology can ensure a good correct rate of recognition for the high quality document images.But for low quality or multi-font document images,the recognition results are usually unsatisfactory.Therefore,OCR technology for these special class document images is still an urgent task.This thesis analyzes the characteristics of special class document images and draws a conclusion that for low quality document images,the part of image prepro-cessing in OCR system should be improved.But for multi-font document images,Chinese character recognition algorithm should be stressed.Thus,this thesis studies a large number of binaryzation algorithms and character recognition algorithms at home and abroad,and proposes two algorithms for low quality document images and multi font document images respectively to improve the whole correct rate of recognition for OCR system.The main research contents are:1.This thesis proposes a binarization method based on local contrast enhance-ment for low quality document images.Firstly,the algorithm uses the theory of quadtree to divide areas adaptively according to the pixels' gray contrast information.And then,contrast enhancement methods are used to adjustthe gray in the regions with different attributes.Lastly,the local threshold is selected by the gray histogram of regional image.Compared with four global and local algorithms,our algorithm simulates all images in DIBCO datasets.By quantitative analysis,it is found that our algorithm gains the highest F-measure and PSNR(Peak Signal-to-Noise Ratio).Meanwhile,this thesis inputs the binaried images by all kinds of algorithms into the ABBYY character recognition software,it is found that the binaried images using the proposed algorithm gain the highest correct rate of recognition up to 98.49%.2.This thesis proposes a Chinese character recognition algorithm combining Ga-bor transform with wavelet transform for the multi-font document images.Firstly,this thesis normalizes the test images.Secondly,for the processed images,this thesis extracts their features of Gabor and features of wavelet.Lastly,SVM(Support vector machine)is regarded as classifier to do the identification and classification.This thesis identifies 100 Chinese characters with different stroke structures in a table and our algorithm can reach more than 98.50% recognition rate.
Keywords/Search Tags:binarization, regional contrast enhancement, Chinese character recognition, wavelet transform, gabor transform
PDF Full Text Request
Related items