Font Size: a A A

Research On Text Detection Method Of Natural Scene Based On Deep Learning

Posted on:2021-08-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:X H WangFull Text:PDF
GTID:1528307100474514Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
The understanding and analysis of natural scenes has always been a hot issue in image processing,pattern recognition,computer vision and other fields.As a special visual element,the text in natural scenes often contains rich high-level semantic information,which is a more accurate description and supplement to the scene content.Therefore,the text detection method in natural scenes has extremely high academic research value.At the same time,it has broad application prospects in many fields such as automatic driving,blind assistance,geographic information annotation,robot automation,and can produce huge social benefits.Text detection in natural scene images and videos faces many challenges.First of all,unlike document text,text line in natural scenes has the characteristics of arbitrary directions.Secondly,multi-language text of different scales in natural scenes requires a more robust detection algorithm.Finally,some unfavorable factors during the video shooting process,such as occlusion,uneven lighting,violent jitter of equipment,etc.will lead to the interruption of text region integrity,color distortion,image blur and other problems,resulting in the performance degradation of the detection algorithm.Based on the deep learning algorithm,this thesis studies the main challenges of text detection in natural scenes.The main research work and innovations are as follows:1.Aiming at the problem of Multi-orientation text detection,this thesis proposed an arbitrary orientation text detection algorithm based on convolution network of coarse to fine supervision,which consider the structural characteristics of the text region.The algorithm is based on the idea of separation and combination.Based on the coarse prediction of the text region,fine character shape segmentation and text central line centerline prediction are obtained.The segmentation results with the same central text line are grouped together from bottom to top,and then form the final detection results.This method can locate the text region accurately,and the prediction of the centerline attribute ensures that the algorithm can detect the text in arbitrary direction.In order to improve the performance of semantic segmentation,a multi-scale feature pyramid structure is adopted in the network design phase.In this structure,high-level features are up-sampled and combined with shallow features layer by layer to enrich semantic information.At the same time,the multi-level supervised learning method is used to improve the generalization ability of the network.This method selects the corresponding supervised information according to different learning tasks.Experiments show that the algorithm can effectively detect arbitrary orientation text in complex scenes.2.Aiming at the problem of multilingual text detection in different scales,this thesis proposed a multilingual text detection algorithm based on the combination of precise text region segmentation and scale estimation.For precise text region segmentation,this thesis proposed a new representation of text region based on text boundary,which can accurately separate the small text objects and estimate the multilingual text of arbitrary direction and shape.At the same time,based on the relationship between image resolution and text region scale,the method enhances the multi-scale feature expression of text region by image pyramid input,and estimates the scale of text region to integrate the detection results in different prediction images.In the network design phase,the network uses a parallel multi-scale feature fusion structure to obtain high-resolution feature representations,while adding a residual pooling module to further enrich background context information.The results show that the algorithm can effectively detect multilingual text in natural scenes,and has certain advantages compared with mainstream algorithms.3.Aiming at the problem that the performance of existing text detection algorithms is degraded due to unfavorable factors such as occlusion,uneven lighting,and violent jitter of equipment in video shooting,this thesis proposed a new video text detection algorithm based on layout constraint tracking.This algorithm uses the detection and tracking framework to track the text by detecting the text in each video frame,and uses the time redundancy of the text in the video to eliminate the false detection and improve the detection performance.In order to improve the performance of single frame text detection in video,a new fast text detection network combined with semantic segmentation is proposed,which can accurately locate text regions by enhancing semantic information in extracted features.To improve multi-text tracking performance,this thesis proposed a text tracking algorithm based on layout constraints,the layout similarity between multiple text regions is used to model the relative position by a new data association cost function.The tracking results are obtained by optimizing this function.The experimental results demonstrated the effectiveness of the proposed method for scene video text detection and tracking.
Keywords/Search Tags:Natural Scene Text Detection, Deep Learning, Arbitrary Orientation, Multilingual Text, Video Text Detection and Tracking
PDF Full Text Request
Related items