Font Size: a A A

Deep Model And Its Application To Visual Text Analysis

Posted on:2017-02-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:S Y ZhangFull Text:PDF
GTID:1108330503485219Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Visual text analysis is one technique which can intelligiently perceive and understand environmental text messages in terms of machine vision, including automatically locating text in scene images, identifying texts and understanding the relevant text attributes. Visual text analysis has a wide application, e.g., language translation, image semantic understanding, human-computer interaction, or assisting the blind reading, shooting and then translation, image retrieval, automatic driving and etc. It also has been one of the research difficulties in the fields of machine vision, pattern recognition, etc.However, previous methods or models pose the following properties:(1) using shallow feature representation;(2) each module is designed independently;(3) unable to automatically learn effective feature representation. Moreover, in reality, a number of complex factors challenge the traditional methods or models, such as the existence of a large number of similar characters in handwritten Chinese characters, the character regions which do not satifiy the definition of connected component, and the problem to learn effective representations for font recognition.Addressing these existing problems, the thesis mainly focuses on three tasks of visual text analysis, which are the discovery and recognition of similar handwritten Chinese characters, the extraction of character proposal region in scene images, the representation learning of Chinese font recognition, respectively. In this thesis, we(1) propose an algorithm for discovering similar Chinese characters which can be applied in the cascaded classification framework;(2) develop a character proposal network for robustly extracting scene texts;(3) design and improve the representation learning on font recognition algorithms. The contributions and innovations of this thesis are as followings:First, for similar Chinese characters, this thesis proposes a multiple-confidences desicion method incorparating entropy computing measurement for discovering similar Chinese characters. Though deep convolutional neural network(CNN) signifi-cantly improves the overall recognition accuracy, the problems of similar Chinese characters is still not be well solved. After carefully analyzing the confidence properties of the test samples, we designed a multiple confidence decision scheme for discovering similar character subsets and similar character pairs. Since the number of the similar characters of each class is not equal and the confusion degree of each character pair is diverse, we further incorpated a similarity ranking method, which is based on entropy measurement computing, into the proposed similar character discovery algorithm. The similarity ranking method enables using less similar character pair to achieve the goal of covering as many misclassified samples as possible. We also provided a quantitative analysis for the subsequent design of cascaded classification framework. At last, based on similar character pairs, this paper presents a cascaded classification framework by combining deep neural network learning and dictionary pair learning. By systematically and comprehensively analyzing the advantages and disadvantages of different models in terms of recognition accuracy and efficiency, we firstly introduced the dictionary pair learning into the second classification stage of cascaded classification framework for solving similar character classification. The experimental results show that the proposed similar character discovery(SCD) algorithm achieves the hit rate of 98.44% and 98.05% on CASIA-OLHWDB1.0 and CASIA-OLHWDB1.0-1.2 datasets, respectively. Meanwhile, the proposed cascaded classification framework on CASIA-OLHWDB1.0 and CASIA-OLHWDB1.0-1.2 datasets decreased the error rate by 18.54% and 16.99%, respectively.Secondly, for character proposal in the scene text image, we propose a robust method for extracting character proposals, namely character proposal network(CPN). Previously experimental results suggest that the traditional methods are prone to false alarm or missing alarm in some cases, including the case of connection of multiple characters, multi-component separation of one character and non-uniform illumination conditions. To address these problems, we investigated several popular object proposal methods and studied two commonly used character proposal methods, namely maximally stable extremal region and stroke width transform. By absorbing the advantages of the sliding window method, and then ultilizing the properties of sharing convolution computation of fully convolutional neural network, we gave the mapping relationship of receptive fields during the forward and backwork processs, and finally develop the character proposal network for locating characters. The multi-task learning idea is incorporated into CPN framework, which enable simultaneously output the response map of character likelihood and the response map of the locations. Morever, a multiple aspect-ratio templates strategy is introduced in CPN, which aims to better cope with the problem of varying character aspect ratio. By embdedding this reliable prior knowledge into this learning framework, CPN predicts more precise position, which is closer to the ground-truth character region. Experimental results show that the proposed CPN achieves the recall rates of 93.88%, 93.60% and 96.46% with 1000 proposals on ICDAR 2013, SVT and Chinese2 k datasets, which outperforms the MSER, EdgeBoxes, Selective Search and MCG algorithms. In this study of character proposal network, we collected and annotated a multi-lingual text detection and recognition dataset, which is called SCUT-FORU-DB. The SCUT-FOR-DB consists of 3,931 scene images and 55,209 annotation instances for characters or words. The dataset is publicly released in the https://www.dropbox.com/s/06wfn5ugt5v3djs/SCUT_FORU_DB_Release.rar?dl=0, which is free for the purpose of research.Third, for understanding the text properties, we propose a fast font recognition method based on local features. We experimentally found that some interesting points of the character strokes carry rich informationative font feature. Based on this observation, we locate the interesting points by corner descriptor and then extract the local features. This method enables using a relatively small number of the interesting points to obtain sufficient discriminant feature, thus significantly speeds up the the process of font feature extraction. Experimental results show that the proposed Chinese font recognition system accelerates the process of feature extraction by 20 times without loss of accuracy. In addition, we developed a method for automatically collecting and annotating scanned document characters, and released a multi-lingual scanned document font database. Besides, we designed a poisson editing based text image rendering scheme by using computer graphics and image processing techniques. The syn-thetic natural image can be used to in the fields of the word recognition, character recognition, font retrieval, character segmentation, etc. At last, we propose a regularized technique, called DropRegion, to improve the ability of representation learning during the deep convolutional neural network. On the commonly used MSDF-DB datasets, the proposed DropRegion increased the classification accuracy in the single-character font recognition by 3.03%, 2.95% and 1.46% by using a different number of training samples; and the DropRegion based font recognition system achieved a recognition accuracy of 99.7% on the MSDF-DB dataset, verifying that DropRegion is a simple yet effective regularized technique.
Keywords/Search Tags:Deep Model, Visual Text Analysis, Similar Characters, Character Proposal Network, DropRegion, Font Recognition, Convolutional Neural Network
PDF Full Text Request
Related items