Font Size: a A A

Research On Text Detection In Complex Scenes

Posted on:2022-07-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:J B HouFull Text:PDF
GTID:1488306320974519Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Scene text contains abundant valuable information and is ubiquitous,such as on traffic signs,billboards,and guideposts.Scene text detection is an important prerequisite for various applications,such as automatic driving,intelligent transportation,translation,video analysis,and image retrieval.However,the complexity of backgrounds and variations in font,size,color,and orientation make scene text detection a challenging task.With the rapid development of deep learning,scene text detection methods have made great progress in recent years.In this thesis,we focus on complex scene text with arbitrary-oriented,varied aspect ratios,and arbitrary shape.In summary,the main contributions of this paper are three-fold:(1)We propose an innovative decoupled Hidden Anchor Mechanism(HAM)for arbitrary-oriented scene text detection.Direct regression and anchor are the two mainly effective and prevailing mechanisms in the paradigm of scene text detection.However,the use of direct regression-based methods may be challenging during optimization without the help of anchors as references.Unfortunately,the anchor-based methods always suffer from the careful design of the anchors,degrading the robustness to complex scenes.To address the above-mentioned problems,we propose an HAM especially for scene text detection.The predictions of anchors are innovatively regarded as hidden layers,and the weighted sum of the predictions is integrated into a direct regression-based network.Hence,the architecture of our HAM still has the characteristic of simplicity as with direct regression-based methods.Moreover,it is easier to optimize anchors as references with this type of method than with direct regression-based methods.In this way,our network can take advantage of both direct regression and anchor mechanisms.In addition,we decouple three kinds of one-dimensional anchors from three-dimensional anchors,greatly reducing the number of anchors in text bounding box matching without performance degradation.We also propose a post-processing technique for long text detection,named Iterative Regression Box(IRB),which takes a few additional computational costs and can be easily generalized to other methods.Experiments on ICDAR 2015,ICDAR 2017 MLT and MSRA-TD 500 datasets demonstrate that the proposed method achieves state-of-the-art performance.(2)We propose a novel Decoupled Feature Pyramid Networks(DFPN)architecture to enhance the discriminability of features for varied aspect ratios,especially rectangular text in long lines.Detecting arbitrary shape scene texts is challenging mainly due to the varied aspect ratios,curves,and scales.In this thesis,we propose a novel arbitrary shape scene text detection method via DFPN and regression-based linking(RegLink).Our innovative DFPN decouples the width and height of feature maps generated by FPN to enhance the discriminability of features for varied aspect ratios.As quadrilateral regression results can not directly represent curve text,we propose a simple yet effective RegLink to link pixels into text instances because pixels in the same curve text have an identical target quadrilateral.Thus,our RegLink can extend the ability of the rotated rectangles text detector for detecting curve text.Besides,we propose a Feature Scale Module(FSM)to enhance the robustness of features for varied scales.In this way,our method can effectively detect scene texts in arbitrary shapes,and experimental results on four publicly available challenging datasets demonstrate the effectiveness of our method.(3)We propose an innovative Kernel Proposal Network(KPN)for arbitrary shape scene text detection.In arbitrary shape scene text detection task,segmentation-based methods are popular but generally rely on complex clustering strategies for grouping pixels into different text instances.In this thesis,we propose a KPN via efficiently classifying instead of computation-consuming clustering on instance-independent feature maps for arbitrary shape scene text detection.To be concrete,our KPN will predict one key center point from each text instance for extracting a kernel proposal(i.e.,dynamic convolution kernel)from the corresponding position in the classification-oriented embedding feature maps.Then,each kernel proposal will individually convolve all embedding feature maps to generate one corresponding channel feature map.In this way,our KPN can separate text instances into individually channels of feature maps and independently predict their masks for generating final contours without clustering.Our KPN enables computation saving by abandoning complex clustering while improves the robustness against tiny intervals and unclear boundaries.Experimental results on four challenging datasets demonstrate the efficiency and effectiveness of our method.
Keywords/Search Tags:Scene Text Detection, deep learning, Convolutional Neural Networks, dynamic convolution kernel
PDF Full Text Request
Related items