Font Size: a A A

Research On Scene Text Extraction And Recognition Based On Deep Learning

Posted on:2021-05-03Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhuFull Text:PDF
GTID:2428330614458274Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
As one of the most active research objects in the field of computer vision,scene text is closely related to many application technologies in real life,including automatic translation,blind reading guide,license plate recognition,etc.Currently,the recognition accurancy of document text can reach up to 99%.Due to the diversity of fonts,multi-directional text,and low resolution of images,etc,the extraction and recognition of natural scene text has become a very challenging task in the field of computer vision.Therefore,this thesis focuses on the extraction and recognition of natural scene text,and the specific contents are assumarized as following:1.A scene text extraction algorithm based on segmentation is studied.Considering the fact that it is commonly expensive and time-consuming to acquire large-scale of human labeled pixel data,and there are a large number of box-level annotations in the existing datasets,based on this,the box-level annotations is proposed to be used as auxiliary data for training.To achieve this goal,a dual-task mutual guidance network is designed,which contains a shared encoder but two decoders for the pixel-level segmentation and box-level segmentation tasks separately.The two decoders work in a mutually guided manner.The output of the pixel-level text segmentation decoder can be used as the guidance information of the box-level text segmentation decoder to improve the performance of box-level text segmentation,and vice versa.Experiments on standard datasets show that the mutual guidance network can effectively extract text information.At the same time,using the pixel-level mask can further improve the text recognition performance.2.A scene text recognition algorithm in any direction is studied.The algorithm first uses the high resolution segmentation network as the basic framework to extract the spatial information of the text.Then the spatiotemporal sequence information of the text was extracted by convolutional long short-term memory.Meanwhile,the character attention mechanism is designed so that the model's attention is on the characters,and the differentiable binarization function is used to further increase the network's attention to the foreground and to weaken the attention to the background area.Finally the network divides each pixel into 37 classes,and use the text transcription module to convert the classification results into text from left to right.The algorithm has been tested on multiple standard datasets,such as ICDAR2013,ICDAR2003,SVT-Perspective,CUTE80 and IIIT-5K.both in regular text and irregular text have achieved good results,fully proved the effectiveness of the algorithm.
Keywords/Search Tags:deep learning, scene text, scene text extraction, scene text recognition
PDF Full Text Request
Related items