Font Size: a A A

Research On Natural Scene Text Detection And Recognition Technology Based On Deep Learning

Posted on:2024-08-03Degree:MasterType:Thesis
Country:ChinaCandidate:Z H LuFull Text:PDF
GTID:2568307103969829Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rise of artificial intelligence,computer vision technology has been assisting people in various domains to accomplish various information processing tasks.As an important medium for information dissemination,images contain a massive amount of information,with text being one of the crucial sources of information within images.Therefore,research on text detection and recognition techniques in images holds significant importance for the development of human society.For regular scanned text images,the images are relatively clear,and text features are easy to extract,making text detection and recognition relatively straightforward.However,for natural scene text images,the complexity of the image background,inconsistent text structures,and other environmental factors pose challenges,leading to a need for improved accuracy in detection and recognition.This article focuses on the characteristics of natural scene text and considers the problem from multiple perspectives.It provides optimizations and improvements to the current methods of scene text detection and recognition.The main research contents of this article are summarized as follows:(1)To address the insufficient fusion of feature information in different scale feature maps and the lack of prominent expression of text features in the feature maps,a scene text detection model based on image segmentation is proposed,called the Text Detection Model based on Feature Pyramid Enhancement,Bottleneck Attention,and Differentiable Binarization(FB-DB).The FB-DB model first utilizes the Res Net residual network to extract deep image features.It then incorporates the Feature Pyramid Enhancement Module(FPEM),which is a cascadeable feature pyramid enhancement component,to effectively fuse feature information from different scale feature maps.Next,the Bottleneck Attention Module(BAM)is employed to focus on text features within the feature maps,highlighting their expression.Finally,the Differentiable Binarization(DB)structure is used to adaptively set the threshold for binarization,thereby improving the accuracy of scene text detection.(2)To address the issues of insufficient feature extraction in traditional neural networks,where they cannot differentiate the importance of feature information across different channels,and the problem of large model size and slow training speed due to excessive parameters in the recognition model,this dissertation proposes a text recognition model based on Squeeze-Excitation CNN and GRU Neural Network(SECGNN).The feature extraction component of the model uses convolutional neural network SECNN with compression and excitation structure.This structure allows training of weights for each feature channel during the training process,enabling more comprehensive extraction of text features.Furthermore,a dual-layer bidirectional Gated Recurrent Unit(Bi-GRU)is employed to learn temporal information.By utilizing Bi-GRU,the model can effectively capture sequential patterns and reduce model parameters and size while maintaining high recognition accuracy.This approach helps to accelerate the training process.Finally,the Connectionist Temporal Classification(CTC)technique is utilized to transcribe and translate the temporal sequence into text content.By leveraging CTC,the model can handle variable-length outputs and provide accurate text recognition.(3)This dissertation presents the implementations of the FB-DB model and the SECGNN model.The FB-DB model is evaluated through scene text detection experiments using the ICDAR2015 and MSRA-TD500 datasets.The experimental results demonstrate that the proposed FB-DB model outperforms other methods in terms of text detection accuracy.In addition,the SECGNN model is evaluated through scene text recognition experiments using the IIIT5 k and ICDAR2003 datasets.The experimental results validate that the SECGNN model achieves superior text recognition performance compared to other approaches.Overall,this dissertation demonstrates the effectiveness of the FB-DB model for scene text detection and the SECGNN model for scene text recognition through comprehensive experiments on various datasets.
Keywords/Search Tags:Deep Learning, Text Detection, Text Recognition, Natural Scenes, Adaptive Thresholds, Squeeze-Excitation Networks
PDF Full Text Request
Related items