Font Size: a A A

Research On OCR Algorithm For Low Quality Chinese Image Based On Deep Learning

Posted on:2020-09-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2428330596976774Subject:Engineering
Abstract/Summary:PDF Full Text Request
The traditional OCR(Optical Character Recognition)is mainly used for the text recognition of documents and mails.The technology related to traditional OCR has been highly developed.In recent years,deep learning techniques have achieved some remarkable results in computer vision.By combining deep learning techniques,implementing a OCR algorithm for low-quality Chinese images is a valuable and challenging task.This thesis uses deep learning technology to optimize and improve the performance of image preprocessing,image feature recognition and text recognition.By applying deep convolutional neural networks,long short-term memory networks and other techniques in these three processes and using the CTW-12 k dataset to train the recognition model,a low-quality Chinese image recognition model is finally realized.Low resolution images are reconstructed into high resolution images using a super resolution approach based deep learning technique.An improved method is proposed based on it.Since the recognition model of this paper requires a fixed-size image input,it is necessary to scale up low-resolution images.The scaled image obtained by traditional interpolation method is blurred,and some image features are destroyed.By using the super-resolution method based on deep learning to extract the features of the image and scale up these features for image reconstruction,and a clear enlarged image can be obtained to improve the accuracy of text recognition.Using the method of transfer learning to reduce the requirement for the size of training data set to train the feature extraction model.Due to the complexity of low-quality Chinese images,training a feature extraction model requires a large amount of training data.By using transfer learning method,the knowledge is migrated from similar tasks to the current task,and only a small amount of training data is needed for the training of the feature extraction model of the current task.A long short-term memory network with a spatial attention mechanism is used as a recognition model for characters.Characters in low-quality Chinese images are difficult to separate and recognize due to background noise,occlusion,and missing parts.The feature maps of the feature extraction model are filtered by using the spatial attention mechanism to obtain the features of the characters,and those features identified by the long-short-time memory network to predict text.Based on the above methods,a low quality Chinese image OCR system is implemented.The main modules of the system are image preprocessing,image feature extraction and text recognition.By comparing the results of experiments and analysis,the low-quality Chinese image OCR system presented in this thesis has better performance than the others.
Keywords/Search Tags:OCR, Deep Learning, Super Resolution, Transfer Learning, Chinese Text Recognition
PDF Full Text Request
Related items