Research On Text Location Method Of Natural Scene Based On Deep Learning

Posted on:2023-02-04

Degree:Master

Type:Thesis

Country:China

Candidate:W J Gu

Full Text:PDF

GTID:2568306776475644

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the development of computer vision,text location in natural scenes has been widely used in many fields such as industrial automation,unmanned driving,and real-time translation.However,due to the complex and changeable text background and large change of text scale of natural scenes,the existing location methods have problems such as insufficient receptive field and low feature fusion efficiency,and there is still room for improvement in location accuracy.In view of the above problems,on the basis of the regression-based location method,this thesis introduces an adaptive feature fusion mechanism and improves the receptive field of network,and further proposes a location network based on Transformer backbone,realizes a scene text location and recognition system.The research work of this thesis is summarized as follows:(1)A scene text location network based on improved receptive field and adaptive feature fusion is proposed to improve location accuracy of extreme aspect ratio and multi-scale text.Firstly,a densely connected dilated convolution module is introduced into the backbone network to obtain multi-scale information with denser sampling points.At the same time,the receptive field of network is improved through dilated convolution with different expansion rates,which makes up for the loss of high-level semantic information due to limited receptive field;In the feature pyramid stage,an adaptive feature fusion module is introduced,which assigns spatial weights to features of different scales,adaptively adjusts the scale information during feature fusion through learning and maintains the scale invariance of features.The experimental results show that this method can effectively improve the location accuracy of extreme aspect ratio and multi-scale text.(2)A scene text location network based on Transformer backbone is proposed,which introduces the Transformer structure into the text location task to enhance the long-distance dependencies between text features,thereby improving the location accuracy.An encoder-decoder structure network based on Transformer-CNN is designed,which uses multiple stacked Transformer encoders to extract features from text images,captures long-distance dependencies to obtain global receptive fields;The traditional feature pyramid network is abandoned,and a depthwise separable convolutional decoder is introduced to fuse local features and reduce the amount of network parameters.The experimental results show that this method improves the accuracy of text location and verifies the effectiveness of Transformer for text location tasks.(3)Based on the text location method proposed in this thesis,a scene text location and recognition system is designed and implemented.The design process of the system is described from the feasibility analysis,overall design and functional design.QT is used as the construction tool of system,Python is used as a training language of the model and My SQL is used as a data storage software.The designed system effectively translates the proposed method into practical application.

Keywords/Search Tags:

Scene Text location, Deep Learning, Dilated Convolution, Multi-scale Feature Fusion, Self-attention Mechanism

PDF Full Text Request

Related items

1	Research On Algorithm Of Crack Detection Based On Multi-scale Dilated Convolution And Dual Attention Mechanism
2	Research On Scene Text Detection Algorithm Combining Dual Attention Mechanism And Dilated Convolution
3	Research On Multi-oriented Scene Text Localization And Detection Based On Multi-scale And Big Receptive Field Deep Learning Features
4	Research And Application Of Natural Scene Text Detection Algorithm Based On Deep Learning
5	Research On Text Detection Of Natural Scene Based On Deep Learning
6	Research On Concrete Pavement Crack Detection Algorithm With Optimized Side-Output Fusion And Attention Mechanism
7	Research On Image Inverse Halftoning Method Based On Multi-scale Feature Fusion Of Gan Network
8	Single-Person Target Tracking Based On Multi-Scale Feature Fusion
9	Research On Scene Text Detection Technology Based On Multi-Scale Information Fusion
10	Research On Scene Text Detection Method Based On Deep Neural Network