Font Size: a A A

Connected Component Based Approach For Text Localization In Images

Posted on:2008-07-15Degree:MasterType:Thesis
Country:ChinaCandidate:X H JiFull Text:PDF
GTID:2178360215993403Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Locating text is refer to detecting and locating the area of characters in images which have complex background. Effective locating text in complex background image can extend the application of OCR technology such as content based image and video retrieval, car plate location and recognition, etc. Locating text in complex background images has become a very hot research issue in document analysis and recognition area.In this paper, we conduct an exhaustive survey of text location methods,categorize them, and discuss the advantage and disadvantage of them. Then we propose text location algorithm based on connected component(CC) and neural network. This method can effectively detect text regions in images and is robust to the variation of character's size, color, and font.CC based text location method is composed by four steps. First, the input image is segmented by improved Niblack method. Then CC analysis is utilized to get CCs. The set of candidate of character CC is obtained. Third, we extract all kinds of features of component. At last, a cascade of threshold classifiers is used to classify CCs into character CCs or non-character CCs.BP Neural Network is introduced into the classification of CCs. The features of CC are used as the input of BP Neural Network. Training samples is got by hand and feed into Neural Network to train the parameters. The trained Neural Network can classify CCs which the cascade of threshold classifiers can not classify and improve the precision of text location.In this paper, we also use Minimum Spanning Tree to combine CCs into text regions. We suppose that the character CCs in the same text region have same size and color, and that they are near each other. According to the distance and similarity between two CCs, the weight of the edge is define. By calculating all couples in the set of character CCs, a graph is got. Minimum Spanning Tree is divided into subsets based on the edge weights by a threshold.
Keywords/Search Tags:Text location, character recognition, component analysis, scene text, Minimum Spanning Tree, BP Neural Network
PDF Full Text Request
Related items