Font Size: a A A

Adaptive Binarization And Character Recognition For Document Image

Posted on:2019-10-06Degree:MasterType:Thesis
Country:ChinaCandidate:K LinFull Text:PDF
GTID:2428330542972890Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Nowadays,as the office gradually works are digitalized,there is a growing demand for converting document images into digital documents.However,scanning documents directly into an image or input information manually is obviously inefficient and needs large amount of storage space.The advantage of Optical Character Recognition(OCR)technology has made it more convenient to convert and storage document images into digital documents.With the gradual maturity of OCR technology,OCR also been used by all walks of life.At present,OCR has a good recognition accuracy when identifying high-quality document images,but not ideal for low-quality document images.Therefore,there is still need for further research on the application to recognize these low-quality document images by using OCR technology.By studying the characteristics of low-quality document images,the processing performance of OCR of low-quality document images lies in the image preprocessing,that is the binarization.For the recognition of multi-font printed Chinese characters,it is necessary to ensure the efficient recognition accuracy and ensure the stability of the recognition system.Therefore,this thesis studies the binarization algorithms and character recognition algorithms both at home and abroad to improve the recognition accuracy and stability of OCR system by researching on improving low-quality document binarization and recognization of the multi-font printed Chinese characters respectively.The main work is as follows:First of all,according to the characteristics of different gray histogram of the images,there is a certain relationship and difference between them.This thesis analyzes the feature of histogram grayscale histogram after classification of binarization.By combining the existing binarization algorithm and SVM torealize the adaptive selection of binarization algorithm for any type document images.Firstly,the images in the Document Image Binarization COntest(DIBCO)standard library are processed;then the processed images are classified according to the predetermined optimal binarization method.Secondly,the characteristics of the gray histogram of the images are extracted as feature vector,followed by the corresponding binarization method as the label to establish the training samples.Finally,an adaptive selection model is established by using SVM to achieve the purpose of adaptive selection of binarization method.Secondly,by studying the characteristics of multi-font printed Chinese characters and the structure of convolutional neural network(CNN),then proposes an improved network structure which based on Le Net-5 structure.By analyzing and improving the characteristics of Le Net-5 structure including input layer,hidden layer,activation function and output layer.This thesis increases the number of feature extraction layers and reduces training parameters by reducing the fully connection layer at the same time,thereby reducing training costs.Using Le Net-5 network structure to identify the printed characters of 100 different fonts printed Chinese characters with different strokes in the first-level word table.
Keywords/Search Tags:binarization, adaptive, Chinese character recognition, surport vector machine, convolutional neural network
PDF Full Text Request
Related items