| Text detection under natural scene images has always been an important research topic in the field of machine vision,which plays an important role in content-based image retrieval,robot navigation,industrial automation and intelligent transportation system.There are two main stages in text detection algorithms for natural scene images: character candidate selection and character classification.In the first stage,the traditional text location algorithms generally adopt the cascading structure to filter the candidates.Once the candidates are filtered,it is difficult to find them back.It also causes the low recall of the text detection.In the second stage,traditional text detection algorithms often adopt supervised learning that uses the huge database to train the classifier.There are two main disadvantages supervised algorithm.First,the cost of the database is expensive.Secondly,the generalization of the algorithm is limited by the database for training.This thesis studies and analyzes the complexity of scene text detection,and further improves process of the text detection algorithm based on the traditional algorithms.In this thesis,we propose an unsupervised text detection algorithm to improve the generalization and stability of text detection.The main research contents and results of this thesis are as follows:(1)The recent research of text detection is discussed,and the current difficulties during the process of text detection are analyzed in this thesis.(2)In the stage of image preprocessing,this thesis first uses the Simple Linear Iterative Clustering(SLIC)to generate primary superpixels with relatively uniform size and distribution.In order to make superpixels more consistent with the character edges,this thesis uses the Density Based Spatial Clustering of Applications with Noise(DBSCAN)to cluster the primary superpixels to generate advanced superpixels with stronger feature discrimination and fewer superpixel amount.In the following,images are operated with the advanced superpixels as the unit.Superpixel-based text detection classifies according to the range of all regions in the image,which improves the recall rate of text detection.The text candidates are clustered according to the density and text characteristics,which greatly reduces the number of candidates and the computational cost of the algorithm.It also enhances the features discrimination of each superpixel candidate,and improves the accuracy of text region classification.(3)In the stage of character candidate selection,in order to detect text in target,this thesis first proposes a text saliency detection method.Next,to improve the precision for Maximal Stable Extremal Region(MSER)and the recall for saliency map,this thesis combines MSER and text saliency detection to extract text candidates.MSER has a good detection performance for local characters,while saliency object detection pays more attention to overall target detection.This thesis analyzes and makes use of the complementary attributes between MSER and saliency detection to generate text map and non-text map.Those maps provide the references for text sample selection,so they are also called text sample reference maps.(4)In the stage of text classifier training,in order to avoid the database dependency for traditional supervised algorithm,this thesis proposes a text sample selection model based on the correlation between some characters in the same image.In this thesis,the text samples extracted from the model are used to train the character classifier to achieve the unsupervised learning.In the process of sample extraction,the model first uses the double-threshold mechanism to classify all the superpixels into three categories: strong text,weak text and nontext based on sample reference map.Different from the traditional single threshold for binary classification,the double-threshold mechanism provides a buffer zone for the classification.Superpixels that cannot be accurately grouped into the text or non-text are divided into weak text group.Therefore,strong text and non-text are more likely to be text or background.In this thesis,strong text and non-text are regarded as positive sample and negative sample respectively and used to train the multi-kernel boosting classifier.In the character classification stage,superpixels divided into weak text classes will be re-classified by the trained classifier.Finally,a large number of qualitative and quantitative experiments prove that the proposed algorithm can better cope with the complex text detection task. |