Font Size: a A A

Research On Scene Text Detection Algorithm Based On Improved Feature Pyramid Network And Feature Enhancement Fusion

Posted on:2024-06-16Degree:MasterType:Thesis
Country:ChinaCandidate:H Y FengFull Text:PDF
GTID:2568307100988689Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Text detection in natural scenes refers to the technology of automatically identifying and locating text regions in natural scene images or videos.With the advancement of deep learning,this technology has been widely applied in various fields such as traffic supervision,intelligent driving,image retrieval,and classification.However,text regions in scene images exhibit diversity and complexity,including factors such as fonts,occlusions,lighting variations,and often appear in complex backgrounds,which pose challenges for detection.Current text detection models typically use image classification networks as backbone networks.However,redesigning and pretraining new feature extraction networks require significant computational resources due to the different requirements of image classification and obj ect detection tasks.To improve efficiency,this study adopts an improved composite backbone network that directly combines pretrained networks and introduces a dynamic gate mechanism to reduce redundant information transmission and enhance the feature extraction capability of the backbone network.The current text detection algorithms are increasingly inclined towards using highresolution images as inputs because they can provide richer semantic features.However,this also requires the models to have a larger receptive field.Therefore,this paper proposes a dual-branch attention-guided feature pyramid method that aims to expand the receptive field through a dual-branch feature fusion module,achieving the fusion of coarse and fine-grained features.At the same time,an attention-guided module is introduced to enhance the semantic information of the features and reduce the disruption of text boundary information caused by dilated convolutions.To address the issue of information loss in the fusion of multi-scale feature maps,this paper proposes the Feature Enhancement Fusion Module(FEFM).By employing attention mechanisms at the feature level,spatial positions,and output channels,the network’s perception capability is enhanced,effectively utilizing multi-scale features while avoiding information sparsity and loss.Finally,the ablation experiment was carried out on the public dataset ICDAR2015 to prove the effectiveness of each module proposed in thi s paper,and the experimental comparison with the current mainstream scene text detection algorithm on the dataset ICDAR2015,ICDAR2017,MSRA-TD500 and the experimental results were displayed.Compared to the current mainstream algorithms,the p recision P,recall R,and F-measure of this algorithm have been improved to a certain extent.And the recall rate R has been significantly improved,which greatly reduces the phenomenon of missed text detection.The F-measure of the algorithm in this paper reaches85.8%,74.5%,and 84.8% on ICDAR2015,ICDAR2017 and MSRA-TD500.
Keywords/Search Tags:Text detection, Feature pyramid, Attention mechanism, Feature fusion
PDF Full Text Request
Related items