Font Size: a A A

Research On Text Location Method Of Natural Scene Based On Deep Learning

Posted on:2023-02-04Degree:MasterType:Thesis
Country:ChinaCandidate:W J GuFull Text:PDF
GTID:2568306776475644Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of computer vision,text location in natural scenes has been widely used in many fields such as industrial automation,unmanned driving,and real-time translation.However,due to the complex and changeable text background and large change of text scale of natural scenes,the existing location methods have problems such as insufficient receptive field and low feature fusion efficiency,and there is still room for improvement in location accuracy.In view of the above problems,on the basis of the regression-based location method,this thesis introduces an adaptive feature fusion mechanism and improves the receptive field of network,and further proposes a location network based on Transformer backbone,realizes a scene text location and recognition system.The research work of this thesis is summarized as follows:(1)A scene text location network based on improved receptive field and adaptive feature fusion is proposed to improve location accuracy of extreme aspect ratio and multi-scale text.Firstly,a densely connected dilated convolution module is introduced into the backbone network to obtain multi-scale information with denser sampling points.At the same time,the receptive field of network is improved through dilated convolution with different expansion rates,which makes up for the loss of high-level semantic information due to limited receptive field;In the feature pyramid stage,an adaptive feature fusion module is introduced,which assigns spatial weights to features of different scales,adaptively adjusts the scale information during feature fusion through learning and maintains the scale invariance of features.The experimental results show that this method can effectively improve the location accuracy of extreme aspect ratio and multi-scale text.(2)A scene text location network based on Transformer backbone is proposed,which introduces the Transformer structure into the text location task to enhance the long-distance dependencies between text features,thereby improving the location accuracy.An encoder-decoder structure network based on Transformer-CNN is designed,which uses multiple stacked Transformer encoders to extract features from text images,captures long-distance dependencies to obtain global receptive fields;The traditional feature pyramid network is abandoned,and a depthwise separable convolutional decoder is introduced to fuse local features and reduce the amount of network parameters.The experimental results show that this method improves the accuracy of text location and verifies the effectiveness of Transformer for text location tasks.(3)Based on the text location method proposed in this thesis,a scene text location and recognition system is designed and implemented.The design process of the system is described from the feasibility analysis,overall design and functional design.QT is used as the construction tool of system,Python is used as a training language of the model and My SQL is used as a data storage software.The designed system effectively translates the proposed method into practical application.
Keywords/Search Tags:Scene Text location, Deep Learning, Dilated Convolution, Multi-scale Feature Fusion, Self-attention Mechanism
PDF Full Text Request
Related items