Font Size: a A A

Slightly Canonical Text Area Detection Based On Deep Learning

Posted on:2022-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:J L LiuFull Text:PDF
GTID:2518306566996729Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The goal of deep learning-based text region detection is to locate and frame text regions from natural scene images.Image contains rich information,and text is one of the important information.Accurate detection of text region can assist computer to understand image.It has widely used for the text detection and recognition in videos,web pages,screenshots,express orders,tickets,cards and certificates.The detection of linear text region with simple background and arbitrary direction in image has been relatively mature.The linear text region can be positioned and framed by quadrilateral polygons.However,the detection of text regions with complex backgrounds is still a great challenge.First,it is difficult to separate the background from the text region with complex backgrounds.Second,it is difficult to accurately determine the shape of the quadrilateral frame for fan-like or curved text regions.In essence,it is necessary to detect and extract high-quality text region pixel sets both regular and irregular text regions in images.In view of the above problems,this thesis makes the following improvements:1.In order to obtain high-quality text region pixel point sets from images,the network model is divided into three branches in this paper.The first branch uses VGG model with the full connection layer removed to form a full convolutional network(FCN)as the basic network to extract local features of the text region.The second branch adds the global information to the model.The global information between channels is added to the model by modeling the channel-to-channel relationships with the Squeeze-and-Excitation(SE)model.The third branch uses Feature Maps of different scales in the full convolutional network to form Feature pyramids,and performs Feature Fusion(FF)on Feature Maps of different scales to reduce the sensitivity of the model to text regions of different sizes.In view of the loss of information caused by the upsampling operation of extracting semantic information from the middle and high level Convolution blocks of the model,Cubic Convolution interpolation is used in this thesis to adjust the size of the Feature Map to ensure the integrity of text information in the process of Feature image fusion.Finally,non-maximum suppression(NMS)is used to screen out the point set whose score exceeds the threshold value as the final text region point set.2.In this paper,a linear text box retrieval method based on quadrilateral and a polygon text box retrieval method based on the Number of Adaptive Point Sets are proposed to retrieve text boxes in the text region by using the pixel points set obtained from the model.First of all,the distance value between the pixel points in the text region and the left and right boundaries in the text box is calculated to determine whether the distance between the pixel points and the boundary is within the distance range.The coordinates of four corner points in the text region can be calculated by weighted averaging the coordinate values of the pixel points within the distance range.Secondly,linear text area of arbitrary directions can be solved by quadrilateral polygons,for irregular,text regions,number of point sets calculation is required,if the bending of text region is large,more points are needed,if not less points are needed.Finally,according to the sorted pixel point sets,the text area is fetched by connecting lines successively.This paper proposes that FCN+FF+23SE text detection model can be applied to images of any size.It can detect linear text areas,regular sector text areas and curved text regions.Compared with other models,the proposed method has improved accuracy,recall rate and speed on multiple data sets.
Keywords/Search Tags:CNN, FCN, Squeeze, Excitation, Feature Fusion
PDF Full Text Request
Related items