Font Size: a A A

Deep Learning Based Scene Text Recognition

Posted on:2017-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:P HuangFull Text:PDF
GTID:2308330482981823Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Different from common visual elements, text in natural scenes conveys rich information of high level semantics, which plays a key role in the scene understanding. In addition, in the industrial areas, there is a strong demand of the technique of text recognition in natural scene. Up to now, text recognition has been widely used in many fields, such as virtual reality, human-computer interaction, image retrieval, unmanned, license plate recognition, industrial automation. Traditional optical character recognition(OCR) technique mainly works on the high-quality document images. OCR technique has already achieved a high recognition rate under the circumstance that the background of the image is clean and the characters in the image are simple and neatly arranged. Different from document image based character recognition, recognizing characters in the natural scene image is more challenging. Because the natural scene image has many difficult elements, such as a complex background, the low resolution, the diverse fonts and the randomly distribution. Therefore, conventional OCR cannot be applied to the natural scene image to do text recognition. As a fundamental work, the continuous development and breakthrough of natural scene based text recognition has far-reaching significance and practical value. This paper proposes an approach of text recognition in natural scenes image by combining deep learning. The main work is as follows:(1) A context-aware image decoding method based on CNN and BiRNN is proposed. Taking advantage of CNN, the high-level visual features can be obtained from the underlying pixels. And exploiting the characteristics of local perceptual of CNN, the positional relationship between the high-level features and the underlying pixels is built. Then making use of BiRNN, the global information of the image can be obtained. Experiments show that this decoding method has a good ability of expression.(2) The text decoding method based on ARSG is proposed, in which character alignment and recognition are completed at the same time. ARSG uses RNN to perform the sequence labeling task. In the per-character classification process, we align each character by finding out the attention points of current RNN state. At the same time, heuristic rules and delay generation technique are used to improve the efficiency and accuracy. Experiments show that this method can lead to better character alignments and recognition results.(3) One efficient deep learning framework is achieved. This framework can support a variety of neural network architecture, and provide a series of effective training strategies. The availability of our approach has been proved by this framework. Experiments show that, compared to earlier works, our approach has higher accuracy and better generalization ability.
Keywords/Search Tags:Character Recognition, Natural Scene Image, Deep Learning, Image Understanding, Semantic
PDF Full Text Request
Related items