Font Size: a A A

Arbitrary Shape Text Detection Based On Convolutional Neural Networ

Posted on:2024-05-10Degree:MasterType:Thesis
Country:ChinaCandidate:J J ZhuFull Text:PDF
GTID:2568307148962909Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Text detection aims to extract text instances from images,and show the position and outline of each instance,which can provide reference for subsequent visual tasks such as text recognition.In recent years,with the improvement of detection theory and convolutional neural networks,text detectors have made great progress in accuracy and speed.They are widely used in life scenarios such as travel navigation and instant translation,and their value is constantly increasing.However,arbitrary-shaped text detection is still a challenging task,because the encoder-decoder networks take less consideration of text shape,aspect ratio and other characteristics.At the same time,their ability to discriminate,extract,and integrate text features is not strong,which can easily lead to non-optimal segmentation regions.To solve these problems,this study proposes three solutions from a practical point of view,which have higher accuracy and speed,and further develop the potential of the two-stage text detection algorithm.The core ideas are as follows:(1)For the problem of inaccurate text regions,a text detection network based on an efficient text decoder is proposed.For a given scene image,the feature redistribution module(FRM)is first used to extract multi-scale text features in parallel,then the text details are combined through weight distribution to generate more accurate text regions and filter out non-text features.Then,we draw inspiration from the transformer structure and construct a CNN text decoder.The text regions in the low-level features are dynamically adjusted through the weight maps generated by the high-level decoding blocks.This can regularize the refinement process of text features and avoid over-activation of ambiguous regions.Experimental results on four datasets show that this method can effectively alleviate the problem of detection accuracy degradation caused by rough segmentation of text regions.(2)For the problem that traditional decoders are difficult to adapt to multi-scale targets and resist background redundant feature interference,a text segmentation network based on multi-scale association capture and hourglass attention is proposed.Introducing a spatial association capture module(SACM)at each scale can significantly expand the receptive field with less computational cost.Subsequently,a dual-branch hourglass attention module(HAM)is deployed,enabling the network to dynamically build weight maps,activating important text regions in the scene image,and suppressing redundant features.The semantic features can be used to generate threshold maps and probability maps,then generate binary maps through differentiable binarization,which is fed back to the original images to obtain the text detection results.(3)Considering the large-scale gap between different text instances in life scenes,a text detection network based on attention backbone and synthetic feature pyramid is designed.It aims to describe text instances of different scales more accurately and retain more original information.The encoder uses the modified Res Net as the backbone,replaces specific CNN layers with channel attention and spatial attention mechanisms,and enhances the ability to extract text shapes and positions.The decoder uses the synthetic feature pyramid(SFPN)as the basic framework,adds intermediate-scale synthetic layers between adjacent layers to make feature fusion smoother and improve the utilization efficiency of original features.Compared with existing text detection methods,this method has better performance and faster speed.
Keywords/Search Tags:Text detection, Convolutional neural network, Feature fusion, Image segmentation
PDF Full Text Request
Related items