Font Size: a A A

Research On Text Extraction Technologies In Natural Scene Images

Posted on:2016-03-16Degree:MasterType:Thesis
Country:ChinaCandidate:Q XiaoFull Text:PDF
GTID:2308330482476816Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
As the carrier of human’s thought and emotion, text contains vital information. Text extraction in natural scene images has broad application prospects in navigation, sensitive information regulation, scene understanding, human-computer interaction, content-based image retrieval technology and so on; it has become a hot spot in recent years.This paper is based on "Telecommunication Network Security System(Ⅱ-stage)" of the Theme Project of National "Twelveth Five-year" 863 Plan. Two key technologies of text extraction in natural scene images, scene text detection and scene text recognition are studied and finally, scene text extraction system is constructed. The main work and contributions are as follows.1. A scene-text detection algorithm based on tree pruning and multi-cues integration is proposed. Firstly, Maximally Stable Extremal Regions are extracted from images as character candidates, large amount of repeating text regions and non-text regions are left in these regions and will seriously interfere with the later scene text detection. On the basis of constructed tree-structured Maximally Stable Extremal Regions, the repeating text regions are excluded according to the aspect ratio and area ratio of parent-child nodes; secondly, the complementary of the stroke width of scene text and the gradient histogram feature on the edge are integrated by the Bayesian classifier to filter non-text regions; finally, similarity criteria were designed based on color, stroke width and size of character candidates to group characters into words. Experimental results show that although the harmonic performance of proposed detection algorithm is only a little higher than similar methods, the speed of proposed method is much faster.2. A histograms of sparse codes based scene text recognition algorithm is proposed. This method is used to recognize words on the text regions selected by proposed scene text detection method above. Firstly, the histograms of sparse codes feature is used to describe the appearance of characters according to the complex and changeful characteristics of text in natural scene images; secondly, the character structure is described by part-based tree-structured model which is improved through histograms of sparse codes of characters, the detection scores is obtained from constructing this model; finally, on the basis of character recognition results, the Conditional Random Field model is built by integrating the detection scores of characters, the spatial constraints of adjacent characters and linguistic knowledge. Experimental results show that the recognition rate is 5.30% higher than similar methods.3. On the basis of the scene-text detection algorithm based on tree pruning and multi-cues integration and the histograms of sparse codes based scene text recognition algorithm, a scene text extraction system is designed through the main function modules, composition structure and processing flow and is realized on the MATLAB platform. The performance of the system is tested on the test images selected from the commonly used datasets and current network.
Keywords/Search Tags:Scene Text Extraction, Maximally Stable Extremal Region, Bayesian Classifier, Deformable Part-based Model, the Conditional Random Field Model, Histograms of Sparse Codes
PDF Full Text Request
Related items