Font Size: a A A

Deep Learning-Based Methods For Text Detection And Recognition In Natural Images

Posted on:2019-05-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:B G ShiFull Text:PDF
GTID:1368330545490374Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Text is a cornerstone of human civilization.As a visual element,text is everywhere in the modern world.Numerous common objects,such as documents,road signs,product packaging,license plates,store facades,all carry text and are described by text.Reading text is a basic function of human vision and an important research topic in computer vision.Technologies that recognize text in images,also known as Optical Character Recognition(OCR),facilitate a broad range of practical applications:license plate recognition,geolocation,receipt digital-ization,autonomous driving,cashierless store,to name a few.Despite the decades of study on document text recognition,scene text recognition has been demonstrated to be a much more challenging problem,mainly due to the large variations in font,color,scale,layout,and image quality that traditional OCR methods struggle to handle.With the fast-paced evolution of deep learning algorithms in recent years,researchers have made revolutionary progress in many fundamental computer vision problems,notably object detection and image classification.Inspired by the advances,this dissertation investigates a few key problems in scene text reading based on deep learning.First,this dissertation proposes a fast orientated text detection method.The key idea is called Segment Linking,where long lines of text are decomposed into two smaller and locally detectable elements,namely segments and links,respectively.A segment is a small slice of a whole word.It has the same height as the text line,but a fraction of its width;A link connects two neighboring segments,indicating that they belong to the same word.Segments and links are densely detected at multiple scales in one forward network pass.Following that,segments that are connected by links are combined into whole word bounding boxes.The method ad-dresses with efficacy the issue of detecting long and thin objects and is advantageous for its detection accuracy,its speed,and its applicability to both English and Chinese text.Second,this dissertation proposes an end-to-end trainable neural network model for text recognition.The model comprises a convolutional neural network(CNN)and a recurrent neu-ral network(RNN).It recognizes cropped text images directly,without the character detection and recognition steps,which are common in traditional methods.Moreover,the model can be trained directly from images and text annotations,requiring no character-level annotations.The model is advantageous in terms of accuracy,small model size,and ease of training and deployment.Third,this dissertation addresses the irregular text problem in text recognition.Irregular text means non-horizontal text.It could be caused by factors such as non-frontal view angle and orientated or curved layout.Irregular text is common in natural scenes and is difficult to recognize.This dissertation proposes a rectification-recognition neural network model,which comprises a rectification network and a recognition network.During inference,Images are adaptively rectified by the rectification network before being recognized by the recognition net-work.The rectification network can be trained purely by the gradients back propagated by the recognition network,thus requiring no human annotations.The proposed model well handles a variety of text irregularities,achieving state-of-the-art performance on multiple tasks.Last,this dissertation investigates a new problem:language identification in scene text images.The identification of text language is in great demand in multilingual scenarios and has not been studied in previous work.This dissertation first collects a new dataset named SIW-13,which comprises 16291 images annotated by 13 language categories.Then,this dis-sertation proposes a recognition model that combines convolutional neural network with the discriminative clustering algorithm.The model is able to capture the subtle differences be-tween languages with similar appearance.It achieves superior recognition performance on the proposed dataset and shows good explainability.With the aforementioned studies,this dissertation builds a complete algorithmic frame-work of scene text detection and recognition and sets the study on base for scene text language identification.
Keywords/Search Tags:Scene Text, Text Detection, Text Recognition, Language Identification, Deep Learning, Convolutional Neural Networks, Recurrent Neural Networks
PDF Full Text Request
Related items