Font Size: a A A

Text Detection And Recognition Based On Semantic Segmentation And Attention

Posted on:2024-09-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:D J ZhongFull Text:PDF
GTID:1528307070960129Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Normally,scene images contain rich text information,which is an important information carrier.With the popularization and application of intelligent devices,scene text such as store names,commodity information,traffic signs and slogans on advertising slogans are exploding.Converting text in scene images into information that can be understood by computers is an important research topic in the field of computer vision.Scene text detection and recognition have become one of the newest research topics in recent years.The task of text detection is to obtain the positions of text,and the task of text recognition is to obtain the corresponding character sequence.The diversity and complexity of natural scene images bring great challenges to text detection and recognition.There are several problems need to be solved.Firstly,some text instances in outdoor scenarios are characterized by character shadows.Current text detection methods cannot detect these text instances accurately.Secondly,text images with complex background are difficult to be recognized.Thirdly,traditional text recognition methods adopt the fixed decoding order.It causes the accumulation of errors and the degradation of recognition performance.Fourthly,it is hard for neighbouring text and curved text to be detected and recognized correctly.Based on semantic segmentation and attention,this thesis explores deep learning algorithms to improve the performance of text detection and recognition.The main contributions are summarized as follows:(1)For achieving accurate positions of scene text images characterized by character shadow,the local resultant gradient vector difference is proposed for text detection.Gradient calculation is an important method in the field of image processing.Usually,there is a significant difference between text and its background.In this thesis,the boundaries are first detected by calculating the gradient.Secondly,the posterior probability of missing background pixels is calculated to recover the character background information and further form the text candidates.Finally,a semantic segmentation method is utilized to detect the text area among text candidates.Experimental results on custom dataset Shadow Text and standard text image datasets show that the proposed LRGVD can effectively improve the recognition performance.(2)For the performance degradation on scene text images with complex background,a scene text recognition method based on semantic generative adversarial network and balanced attention is proposed for arbitrarily oriented scene text recognition.The semantic generative adversarial network consists of a semantic generation module and a semantic discrimination module.For text images with complex background,the semantic generation module generates simple semantic features,which have the same distribution as that of simple background text images.The semantic discrimination module distinguishes the source of semantic features from simple background and complex background images.In addition,a balanced attention module is introduced to alleviate the problem of attention drift.The balanced attention mechanism first learns a balancing parameter based on the visual glimpse vector and semantic glimpse vector,and then performs the balancing operation for obtaining the final balanced glimpse vector.Experimental results on public benchmarks attest to the effectiveness of our method.(3)To solve the problem that the fixed decoding order cannot make full use of contextual information,a scene text recognition method based on adaptive decoding order is proposed.The method decodes with an adaptive decoding order,which ensures that good-quality characters can be first decoded followed by low-quality characters.The decoder consists of a Random Order Generation module and a Visual-ContentPosition module,where the former learns to decode with random decoding orders and the latter establishes robust connections among visual information,content and position.The proposed method achieves SOTA results on custom dataset OLQT and standard text image datasets.(4)For neighbouring text detection and arbitrary text recognition,the text proposals with location-awareness-attention network is proposed for arbitrarily shaped scene text detection and recognition.In the text detection stage,the method first extracts the central area of the text through the Center Mask Predictor module to generate the preliminary bounding boxes.Then,tight bounding boxes are fitted from inside to outside,which can avoid false and missed detection caused by neighbouring text.Finally,the Mask Predictor module is used to obtain the accurate position of the text,which contributes to detecting multi-oriented and curved text.In the recognition process,the text position information is fused,and the two-dimensional attention weight is learned through the location-awareness-attention module,which is conducive to the recognition of curved text.Experimental results on standard text recognition datasets validate the effectiveness of the proposed method.
Keywords/Search Tags:Text Detection, Text Recognition, Semantic Segmentation, Local Resultant Gradient Vector Difference, Semantic GAN, Attention
PDF Full Text Request
Related items