Font Size: a A A

End-to-end Text Recognition In Natural Scene Images

Posted on:2022-10-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:M H LiaoFull Text:PDF
GTID:1488306572476404Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Text is one of the most influential inventions of mankind.The precise high-level semantic information contained in the text helps us understand the world.The natural scene text studied in this article refers to the text in the natural image.Text often appears in various scenes or objects,such as street signs,license plates,shop signs,merchandise,posters,certificates,etc.This has produced a large number of practical applications based on text recognition,such as geographic positioning,license plate recognition,shop sign recognition,certificate recognition,and so on.Natural scene text recognition is different from traditional document text recognition.It has the characteristics of complex background,diverse shapes,flexible arrangement orientations,rich fonts,and large variation of scales,which brings many new challenges to text recognition.This dissertation conducts a series of researches on the key scientific problems in four aspects,including the tradeoff between accuracy and speed of text detection,complex-shaped text detection,complex-shaped text recognition,and the combination of text detection and text recognition:(1)A regression-based multi-oriented text detection algorithm is proposed.This algorithm explores two different ways of expressing multi-oriented text regions and adjusts the receptive field of convolution according to the aspect ratio of the text.To a certain extent,it solves the problems of accurate expression for localization and unmatched receptive field.The algorithm's series of novel designs for text characteristics,including the representation of the bounding box,the configuration of the default boxes,the design of the convolution kernel size,the data augmentation for small-scale text,etc.,greatly enhanced the accuracy of text detection while maintaining the simplicity of the algorithm.Besides,the algorithm also proposed for the first time the use of text recognition results to improve the accuracy of text detection,verifying that text recognition is helpful for text detection,laying a foundation for the subsequent end-to-end text recognition algorithm that integrates the detection and recognition modules into a unified model.(2)A segmentation-based arbitrary-shaped text detection algorithm is proposed.This algorithm innovatively proposes a differentiable binarization module and a dynamic threshold module,so that the binarization process can perform end-to-end joint optimization with the segmentation network.The differentiable binarization module greatly improves the effect of text segmentation and the accuracy of text detection while maintaining a simple postprocessing process.Profiting from the pixel-level segmentation prediction,this algorithm can accurately describe text regions of arbitrary shapes.The algorithm can detect various complex shapes of text,including multiple directions,extreme aspect ratios,irregular shapes,etc.,and has achieved the best detection accuracy and the fastest inference speed on multiple standard natural scene text detection datasets.(3)An end-to-end text recognition algorithm for arbitrary-shaped text based on instance segmentation is proposed.It combines the advantages of regression and segmentation,and performs arbitrary-shaped text detection in an instance segmentation manner.Different from the previous one-dimensional sequence-to-sequence text recognition algorithms,this algorithm introduces character segmentation and spatial attention modules to decode text sequences in two-dimensional space,which not only reduces the difficulty of the training but also improves the recognition ability for irregularly shaped text.Besides,aiming at the shortcomings and bottlenecks of the previous end-to-end text recognition algorithms based on region proposal network,this algorithm innovatively proposes a segmentation proposal network to replace the region proposal network,and further improves the robustness to complex-shaped text.The anchor-free segmentation proposal network overcomes the limitations of the region proposal network in processing the text instances of dense rotations,extreme aspect ratios,and irregular shapes.It also provides a more accurate proposal to improve the robustness of text detection and recognition.Therefore,it significantly improves the robustness of rotations,aspect ratios,and irregular shapes,and achieves the best results on multiple challenging natural scene text datasets.In summary,this dissertation proposes a series of complex-shape scene text detection algorithms and end-to-end text recognition algorithms to effectively solve the key problems in end-to-end text recognition,which provide effective support for subsequent natural scene text research.
Keywords/Search Tags:Scene Text, Text Detection, Text Recognition, End-to-End Text Recognition
PDF Full Text Request
Related items