Adaptive Binarization And Character Recognition For Document Image

Posted on:2019-10-06

Degree:Master

Type:Thesis

Country:China

Candidate:K Lin

Full Text:PDF

GTID:2428330542972890

Subject:Signal and Information Processing

Abstract/Summary:

Nowadays,as the office gradually works are digitalized,there is a growing demand for converting document images into digital documents.However,scanning documents directly into an image or input information manually is obviously inefficient and needs large amount of storage space.The advantage of Optical Character Recognition(OCR)technology has made it more convenient to convert and storage document images into digital documents.With the gradual maturity of OCR technology,OCR also been used by all walks of life.At present,OCR has a good recognition accuracy when identifying high-quality document images,but not ideal for low-quality document images.Therefore,there is still need for further research on the application to recognize these low-quality document images by using OCR technology.By studying the characteristics of low-quality document images,the processing performance of OCR of low-quality document images lies in the image preprocessing,that is the binarization.For the recognition of multi-font printed Chinese characters,it is necessary to ensure the efficient recognition accuracy and ensure the stability of the recognition system.Therefore,this thesis studies the binarization algorithms and character recognition algorithms both at home and abroad to improve the recognition accuracy and stability of OCR system by researching on improving low-quality document binarization and recognization of the multi-font printed Chinese characters respectively.The main work is as follows:First of all,according to the characteristics of different gray histogram of the images,there is a certain relationship and difference between them.This thesis analyzes the feature of histogram grayscale histogram after classification of binarization.By combining the existing binarization algorithm and SVM torealize the adaptive selection of binarization algorithm for any type document images.Firstly,the images in the Document Image Binarization COntest(DIBCO)standard library are processed;then the processed images are classified according to the predetermined optimal binarization method.Secondly,the characteristics of the gray histogram of the images are extracted as feature vector,followed by the corresponding binarization method as the label to establish the training samples.Finally,an adaptive selection model is established by using SVM to achieve the purpose of adaptive selection of binarization method.Secondly,by studying the characteristics of multi-font printed Chinese characters and the structure of convolutional neural network(CNN),then proposes an improved network structure which based on Le Net-5 structure.By analyzing and improving the characteristics of Le Net-5 structure including input layer,hidden layer,activation function and output layer.This thesis increases the number of feature extraction layers and reduces training parameters by reducing the fully connection layer at the same time,thereby reducing training costs.Using Le Net-5 network structure to identify the printed characters of 100 different fonts printed Chinese characters with different strokes in the first-level word table.

Keywords/Search Tags:

binarization, adaptive, Chinese character recognition, surport vector machine, convolutional neural network

Related items

1	Adaptive Support Vector Machine And Its Application In Handwritten Chinese Character Recognition
2	The Research Of Offline Handwritten Chinese Character Recognition Based On Deep Learning
3	Research On Spray Code Character Recognition Technique Of Plate
4	Research On Off-line Handwritten Chinese Character Recognition System
5	Off-line Handwriting Chinese Character Recognition Framework Design Based On Convolutional Neural Network
6	Research On Off-line Handwritten Chinese Character Recognition System
7	Online Chinese Handwriting Character Recognition System Based On Convolutional Neural Network
8	Research On Character Recognition Based On Neural Network And Support Vector Machine
9	Research On Handwritten Chinese Character Acquisition And Recognition System Based On Interactive Mode
10	Research On Industrial Character Recognition Method Based On Convolutional Neural Network