Font Size: a A A

Research On Text Detection Technology In Natural Scene Image

Posted on:2020-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhangFull Text:PDF
GTID:2428330623955811Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
This paper starts from the technical difficulties of natural scene text detection,and conducts in-depth research on flexible texts of arbitrary shapes and related positioning algorithms of multi-scale texts.By combining the latest deep learning theory research and network model fixed-point acceleration technology,two different kinds of Natural scene text target detection model.details as follows:(1)The size of the text area in the natural scene is extremely different.It is generally used to predict all the scales using a certain feature layer,and its performance is generally poor.In this paper,the need for different hierarchical feature maps for different scale targets is required.By adding multiple layers of progressively downsampled convolutional layers behind the feature extraction network,U-shaped symmetric structures are used to directly splicing or re-converging the previous volumes at the same time.Layers to obtain multi-scale information.Secondly,by reducing the domain shift of the pre-training model and the retraining input image scale in the detection algorithm,try to make the size of the pre-training target close to the size of the detected text,and improve the text without changing the network structure.Detect the performance of the network.(2)Based on Mask R-CNN algorithm,an improved text detection model of Inception RPN module and adaptive scale test mechanism is proposed.Inspired by Mask R-CNN,the detector detects the text area of an arbitrary shape by generating a shape mask of the detection object and using the output of the mask branch to locate the text area.In order to solve the problem that the instance segmentation algorithm lacks contextual global information and inaccurate classification scores in the region candidate frame generation network,the Inception Region Proposal Networks(Inception RPN)module and adaptive scale test mechanism are designed.For the Inception RPN module,it is proposed to process different aspect ratios and scales of text by selecting multiple branches of different convolution kernels,and to fuse the context information of the convolutional feature maps at corresponding positions of the candidate boxes,thereby obtaining a higher quality text candidate frame.Vector.This module effectively avoids the accumulation of errors in the bottom-up text candidate box generation,and only requires hundreds of text candidate boxes to achieve a high recall rate.For the adaptive scale test mechanism,this is because scene text detection is different from common object detection,and natural text usually differs greatly in size ratio,aspect ratio,and direction.To solve this problem,the algorithm adaptively stretches the test image to a size consistent with the ratio of the backbone image to obtain the maximum response.This makes it possible to further improve the accuracy of small-scale text detection while ensuring that large-size text is not erroneously detected.By verifying on the standard dataset,the proposed algorithm achieves an F1 accuracy of 0.90 on the ICDAR 2015 standard test data set,and an F1 accuracy of 0.76 on the ICDAR 2017 MLT standard test data set,which is higher than previously proposed.Optimal result.(3)This paper proposes a text detection model based on the U-shaped network structure and position weighted loss function of the full convolution network.Inspired by the FCN algorithm,in order to solve the problem that most high-precision text detection algorithms cannot be transplanted into portable devices with poor computational power,this paper simplifies the need for traditional R-CNN networks by improving the full convolutional neural network.The process of generating a preset frame directly performs text detection and positioning on a single network.Specifically,this paper introduces U-shaped structure to multi-scale fusion of features generated by feature extraction network,which solves the information loss caused by continuous downsampling of feature maps in the calculation process,and improves the robustness of the whole model on multi-scale targets.Sex.Secondly,through the improvement of the position weighted loss function and the preprocessing of the text annotation,the text detection accuracy is improved to a certain extent without increasing the calculation amount.By verifying on the standard dataset,the proposed algorithm achieves an F1 accuracy of 0.93 on the ICDAR 2013 standard test data set.Further,by training the trained floating-point model and using OpenCV to call the model for forward reasoning,the computational performance of the model is greatly improved,which lays a foundation for future industrialization transplantation.
Keywords/Search Tags:Natural Scene Text Detection, Natural Scene Text Positioning, End-to-End Text Recognition
PDF Full Text Request
Related items