Font Size: a A A

Research On Deep Learning Based Chinese Scene Text Detection And Recognition

Posted on:2018-02-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:X H RenFull Text:PDF
GTID:1368330590955290Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Text detection and recognition is of great significance in many application of computer vision,such as image and video content understanding,information retrieval and target location.It also has great help in some prospective applications like robot vision system,automatic driving and virtual reality.Due to the high complexity of scene text,traditional image processing methods have difficulty to accurately and comprehensively extract text regions and features.Compared with the widely studied English characters,Chinese characters contain two-dimensional spatial structures,which are more difficult to be distinguished from complex backgrounds and recognized.Deep learning model is a popular algorithm model basis in image processing.Unlike traditional artificial designed feature extraction algorithms,it can learn deep image features from the image data,which makes the feature design more efficient and targeted.In this article,we propose a merged Chinese scene text detection and recognition algorithm that inherits the design idea of the classical text information extraction system framework,combines the latest cognition of text detection and recognition algorithm,utilizes the excellent characteristics of deep learning model and the unique structural features in Chinese characters.We focus on the three key functional modules and methods: 1)candidate text region extraction,separates text and background regions and provides image regions for deep learning model to detect and recognize;2)Chinese text feature extraction,extracts text structure features that conducive to Chinese text detection and recognition models;3)deep learning model training tools,make the training process of deep learning model accurate and effective in the circumstance that the existing natural Chinese scene text datasets are insufficient in the size and labeling.Finally,based on the above research we attempt to merge the independent text detection and recognition algorithms in traditional frameworks.The contributions mainly focus on the following aspects:(1)Candidate text region extraction.Candidate text region is the local image region for text detection and recognition,whose extraction accuracy is important to algorithm performance.Traditional image region extraction algorithms are mostly designed for image objects.The extracted text regions are often mixed with complex background.Thus the independence becomes the short board in the accuracy,which reduces the algorithm performance in text detection and recognition,especially in deep learning model based algorithms.In the MSER extraction algorithm,edge blur is often regarded as a significant local interference of the image,affecting the independence of the extracted regions.In this article,we statistically analyze the MSERs from scene images focusing on the edge blur.We find that the text and background regions are significantly different in edge blur.According to this finding,we add edge blur analysis into MSER extraction algorithm to extract image regions with obvious text edge blur.The independence of the extracted image regions are strong,thus it is named I-MSER.We design an adjustment parameter in it to adjust the edge blur tolerance.So that the extracted regions can be applied to both text detection and recognition algorithms.The experimental results show that text detection algorithms with I-MSER algorithm performs much better than those with other candidate text region extraction algorithms.(2)Chinese text feature extraction.In the Chinese scene text detection and recognition algorithms,the uniqueness and distinguishability of Chinese text feature are the focus of detection and recognition algorithm respectively.As Chinese text are very complex in text structure,the features extracted by commonly used deep image feature extraction structures are lacking of both uniqueness and distinguishability.We explore the essential Chinese text feature from the evolution of Chinese text and the Chinese text cognition of human.It is found out that the structural components are the key in Chinese character construction and cognition.Through the statistical analysis of Chinese characters,we find that the structure components are highly similar in sizes,which can be divided into eight types.Based on the statistical analysis and the feature extraction structure in deep learning model,we design a special detection window for each component types,which is sensitive to them and insensitive to other components,making it a special extraction structure for text structure components.In deep learning model,these extraction structures are placed in parallel and become a dedicated extraction layer for structure features,called the Chinese text structure component detector(TSCD)layer.In the TSCD layer,these parallel structures ensure the extraction process noninterference.So it can extract accurate and comprehensive structure feature from the upper text stroke feature,with high uniqueness and strong distinguishability.The experimental results show that extracted Chinese text structure feature makes both text detection and recognition algorithms perform better than basic deep learning feature.Moreover,it has advantage in merging text detection and recognition stages in text detection and recognition algorithm.(3)Training modules for deep learning model.The size and label of existing scene text image datasets are very limited,especially in Chinese text image datasets.Using classic methods to train a deep learning model with these datasets makes it sensitive to text image,which limited text detection and recognition performances.As it is highly costly to build large datasets,applying efficient training methods is essential in deep learning model based Chinese text detection and recognition algorithms.We explore training modules for deep learning model from two perspectives: unsupervised learning and training sample expansion.For highly abstract Chinese texts,convolution layer and the sparse-coding method are combined into a novel unsupervised learning method,named convolutional sparse auto-encoder(CSAE).It enables convolution layers to learn effective parameters from unlabeled artificial Chinese text samples,which is used as pretraining parameters for the convolution layer in CNN.We propose an artificial Chinese text sample generator by analyzing the three generation stages of Chinese scene texts.It easily generates a large amount of artificial Chinese text samples that are highly similar to the scene text samples that can be used to pre-train the deep learning model effectively.The experimental results show that the CSAE and the artificial Chinese text sample generator are helpful to train Chinese text detection and recognition deep learning models.They are of vital importance to merge Chinese text detection and recognition algorithms.(4)Chinese text detection and recognition algorithm merge attempt.Traditionally,text detection and text recognition are two separate algorithms.Because there are many differences in their algorithmic and feature selection,combine them into an end-to-end text detection and recognition algorithm would cause serious incompatibility,which makes the algorithm performance worse.Recently,merged text detection and recognition algorithms are widely researched because its merging design of functional components eliminates the incompatibility in combination.By analyzing the structure of the separate text detection and recognition models,we find out that the candidate text region and the Chinese text feature are the important merging points.Based on this finding,we design a merged Chinese scene text detection and recognition algorithm based on the I-MSER extraction algorithm and the TSCD layer.There is two key points in merging: 1)candidate text region generation algorithm,corrects I-MSERs to one single Chinese character.2)merged Chinese text detection and recognition deep learning model,inputs features extracted by the TSCD layer into the detection and recognition classifier at the same time,and merges results based on prior knowledge.The experimental results show that the merged structure of Chinese text detection and recognition algorithm achieves better results than the traditional structures,which shows the great potential of merged structure.
Keywords/Search Tags:Chinese text, text detection, image feature, text structure, image region extraction, statistical model, region tree, unsupervised learning, convolutional neural network, pre-training, artificial image generation, cognition model
PDF Full Text Request
Related items