Font Size: a A A

Natural Scene Chinese Text Detection And Recognition Based On Deep Learning

Posted on:2024-04-19Degree:MasterType:Thesis
Country:ChinaCandidate:L M YuFull Text:PDF
GTID:2568307082480424Subject:Mathematics
Abstract/Summary:PDF Full Text Request
Natural scene text detection and recognition is an important research area in the fields of pattern recognition and computer vision,with wide applications in areas such as autonomous driving,image information retrieval,and photo translation.Text detection involves locating text regions in images and generating corresponding bounding boxes,while text recognition extracts and recognizes the content within these bounding boxes.Although natural scene text detection and recognition has become a hot topic,most existing methods only focus on English text.Compared to English text images,Chinese text images in natural scenes pose greater challenges due to extreme aspect ratios,large scale variations,and diverse orientations.Additionally,Chinese characters are more numerous and structurally complex than English characters.Therefore,existing methods cannot be directly applied to natural scene text detection and recognition tasks for Chinese.However,text detection and recognition for Chinese also have extensive real-world applications.This thesis focuses on Chinese text detection and recognition,and aims to propose more effective methods.Specific work includes the following two aspects.For the task of natural scene Chinese text detection,existing methods have accuracy issues in complex scenarios.To more accurately locate text in complex situations,this thesis proposes a deep neural network called FFDA-DBNet based on DBNet,which combines FPEM,FFM,and DAMM.FPEM enhances features at different scales in images through top-down and bottom-up processes and uses FFM to effectively fuse multi-scale features,thereby addressing the problems of missed or incorrect text detection.DAMM is introduced in FPEM to capture global semantic information more effectively through position attention and channel attention,thus addressing the issue of non-tight text boxes.Experimental results on the ICDAR2019-LSVT-2500 dataset show that the proposed method achieves better F-Score and recall rates and reduces the occurrence of missed,incorrect,and non-tight text boxes to some extent.For the task of natural scene Chinese text recognition,existing methods still suffer from inaccurate recognition due to the extremely low accuracy in recognizing vertical text.To improve the efficiency of Chinese text recognition in natural scenes,this thesis proposes a deep neural network that combines Conv-FEM based on SVTR-T.Compared with SVTR-T’s self-attention mechanism feature extraction module that uses many attention matrices and softmax operations,Conv-FEM achieves feature fusion through convolutional operations,resulting in smaller parameter size and higher computational efficiency,thereby improving inference speed.Experimental results on the Chinese Benchmark dataset show that the proposed method achieves better results in terms of accuracy,minimum edit distance,and frame rate,and to some extent improves the accuracy of Chinese text recognition.
Keywords/Search Tags:deep learning, natural scene text detection and recognition, feature enhancement, dual attention mechanism, Vision Transformer
PDF Full Text Request
Related items