Scene Text Detection Algorithm Based On Non-local Attention And Feature Enhancement

Posted on:2024-02-25

Degree:Master

Type:Thesis

Country:China

Candidate:J H Luo

Full Text:PDF

GTID:2568307136988669

Subject:Circuits and systems

Abstract/Summary:

With the development of deep learning,significant progress has been made in text detection technology,which has been widely applied in various fields and become one of the current research hotspots.However,deep learning-based scene text detection algorithms often face the following problems due to the complex background and diverse fonts of natural scene images:(1)poor detection performance for closely connected text instances.(2)inadequate feature extraction ability using lightweight backbone networks.(3)a trade-off between algorithm accuracy and speed,where higher accuracy algorithms tend to sacrifice detection speed.To address these issues,this paper proposes a scene text detection algorithm based on non-local attention and feature enhancement.The specific research content is as follows:(1)To solve the problem of poor detection performance for closely connected text instances,this paper combines the differentiable binary segmentation of DBNet and takes the lightweight Res Net-18 network as the backbone network.Furthermore,Global Context Net is incorporated into the feature extraction structure to expand the model receptive field,which can not only captures contextual information in the region but also reduces computational complexity,ensuring the portability of the network.(2)To solve the problem of inadequate feature extraction ability using lightweight backbone networks,this paper replaces the original feature pyramid structure with a feature pyramid enhancement module and a feature pyramid fusion module.The feature pyramid enhancement module can not only propagate high-level semantic features from top to bottom,enhancing the semantic information of the entire pyramid feature,but also propagate the position information from bottom to top,allowing better localization of small targets in the image.The feature fusion module integrates feature information from different levels to improve feature representation,enabling the model to better distinguish between different samples.Additionally,the regular convolution structure in the feature pyramid enhancement module is replaced with depth-wise separable convolution to reduce network complexity while maintaining the accuracy of the model.(3)In scene text detection,due to the small proportion of text regions and the large proportion of negative samples,assigning equal weights to all classes in the binary cross-entropy loss leads to low training efficiency and inability to achieve the expected optimization effect.To solve this problem,this paper replaces the binary cross-entropy loss with Focal Loss.Focal Loss can not only adjust the weights of positive and negative samples,but also dynamically reduce the weights of easily distinguishable samples by modulating the factor during the training process,thereby quickly focusing on the difficult samples that are difficult to distinguish,improving the training efficiency and accuracy of the model.

Keywords/Search Tags:

scene text detection, differentiable binarization, GC Net, feature enhancement, Focal Loss

Related items

1	Research On Natural Scene Text Detection Algorithm Based On Deep Learning
2	Research On Scene Text Detection And Image Classification Based On Convolutional Neural Network
3	Research On Scene Text Detection Via Feature Fusion
4	Research On Text Line Detection In Natural Scene Based On Deep Leaning
5	Research On Scene Text Detection Based On Deep Learning
6	Text Detection And Localization In Natural Scene Images
7	Natural Scene Text Detection Based On Attention And Feature Enhancement
8	Research On Multi-features Natural Scene Text Detection Based On Image Enhancement
9	Scene Text Detection Based On Deep Learning
10	Research On Scene Text Detection Algorithm Based On Improved Feature Pyramid Network And Feature Enhancement Fusion