Font Size: a A A

Research On Scene Text Detection Based On FCN And Feature Layer Fusion

Posted on:2020-09-25Degree:MasterType:Thesis
Country:ChinaCandidate:J LiFull Text:PDF
GTID:2428330578454632Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The direct meaning of the text in the image is especially critical for understanding the content of the scene.More and more intelligent applications use the text information in the scene image.The existing scene text detection methods have many limitations because of the influences of different fonts,arbitrary directions,complex backgrounds,and illumination.Compared with traditional machine learning algorithms,deep learning algorithms have the better performance which can learn the depth features of the text.Two text detection models are proposed by integrating the idea of segments and links,in which the one model is based on position regression,and another use the method of semantic segmentation.(1)The characterization enhancement model based on feature layer fusion.The segment and link based model is not powerful enough for detecting the small texts due to insufficient semantic information on text of the hierarchical structure.The architecture of feature layer fusion is used to solve the above problem.Firstly,the feature layer is upsampled using transposed convolution and then layer-by-layer fusion is carried out in order from the back to the front.The fusion feature layer adds enhanced global features while retaining the high-resolution detail features,which provides more accurate position and boundary information for positioning text.The F-value detected using the fusion feature layer is improved by 1.9%and 1.6%,respectively,in the ICDAR 2015 and MSRA-TD500 compared to the segment and link based model.After adding the architecture of feature layer fusion,the network deepening makes the error transmission difficult,which may lead to the degradation of network performance.Therefore,an architecture of prediction based on the residual network is designed.The training difficulty of the network is reduced by using the leap connection structure.In ICDAR 2015,the original training loss can be achieved with only 50%of the iterations,and the F-value is improved by 1.0%.A strategy of fragment grouping based on rotation angle is used to solve the problem that the segment and link based model uses a single rectangle to locate curved text with large error.The positioning error is effectively reduced by segmenting the curved text and using a single rectangle to calibrate each piece of text separately.(2)The candidate text filtering model based on semantic segmentation.An architecture of semantic segmentation is designed for the problem that there are many false background detections in the characterization enhancement model.The Fully Convolutional Networks(FCN)is used to obtain the text saliency map and then use the connected component analysis algorithm to extract the valid text area.It can effectively eliminate some false detections by filtering the candidate text whose text region proportion is lower than the threshold,and the accuracy in ICDAR 2015 is improved by 5.0%.A text balance strategy is used to solve the problem that the semantic segmentation loss is inclined to the large text which leads to the poor accuracy of small text discrimination.It can balance the classification performance of the model for different sizes of text by assigning the same weight to each text instance,which enhances the ability of the model to discriminate small texts,and the F-value in MSRA-TD500 is increased by 1.0%.Comparing the performance of state-of-the-art algorithms in different datasets,the F-value of the candidate text filtering model in ICDAR 2013 is 1.1%higher than the Detecting Scene Text via Instance Segmentation(PixelLink).Finally,a web-side scene text detection system is implemented by combing feature layer fusion and semantic segmentation,which supports two localization granularity of words and text lines,and can accurately locate the text in some real-life scenes.
Keywords/Search Tags:Scene text detection, Feature layer fusion, Residual network, Semantic segmentation, Text balance
PDF Full Text Request
Related items