Font Size: a A A

The Research On Text Identification And Detection Algorithm Of Natural Scene Images

Posted on:2020-04-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y C YuFull Text:PDF
GTID:2428330599459584Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the advent of the era of multimedia information,there is a large number of wide variety of information generated and disseminated every day,and text is one of the most important carriers of such information.Therefore,it is very important that if the computers can effectively detect the text of various languages under the condition of natural scenes,which helps the computers to understand the high-level semantic information contained in images.Detecting text under the condition of natural scenes has great value of research and application.At the same time,natural scene text detection has broad prospects and demand in many application scenarios such as vehicle automatic driving,intelligent navigation,visual interaction and intelligent robot.At present,although a large number of models are applied to natural scene text detection,most of the algorithm and models are based on traditional handcraft feature extractors and shallow models which cannot learn high-level semantic features and build relationships between them.With the wide application of Deep Learning,it has succeeded in many research fields thanks to the powerful automatic features learning abilities and features modeling capabilities of deep convolutional neural networks.Therefore,the research community has also begun to apply deep learning to text detection algorithms.This thesis focuses on the characteristics of natural scene text such as large scale changes,large aspect ratio changes,complex background confusing,blurred images,distortion,insufficient illumination and the drawbacks of existing algorithms.Based on the existing related theories and techniques,this thesis proposed,constructed and optimized a natural scene text detection system algorithm based on deep convolutional neural network.Finally,the effect of the text detection model is verified through experiment.The main contribution of this thesis includes these following aspects:1.If all images and videos on the Internet are processed,it will consume huge computational and storage resources.At the same time,current text detectors are mostly language dependence(script-based)that requires recognize the script of the images before detecting text with appropriate text detector.Therefore,a block-based text images filtering and script identification algorithm which merge these two task into a united framework is constructed.With the advantage of convolutional neural network,it is possible to quickly distinguish whether the image contains text regions.Once an image is determined that contains text regions,the script of these text regions will also be judged.Finally the image will be sent to subsequent text detector and preforms text detection accurately.The proposed algorithm can effectively reducing the consumption of calculation and storage resources without missing any information,helping the detection system reducing its parameter and speeding up.2.According to the characteristics of large changes in the scale and aspect ratio of text area,a text detector that detects local text area and then combines them into a complete bounding box is designed.Different from the previous text detectors that directly regress the whole bounding box,we adopt a method that first regressing the partial local bounding box of a word or text line(also called Segment)and then linking them to form a complete word or text line bounding box.The proposed text detector is more robust to text regions of different scales and different aspect ratios.3.We propose convolutional neural network sub-modules called Guided Self-Attention Module(GSAM)and Inception Atrous Spatial Pyramid Pooling(IASPP).GSAM utilizes the powerful pixel-level classification ability of the fully convolutional network to make the network has the ability of “self-supervising”,which can quickly locate the area containing the text in the picture and strengthen the response of the neural network feature maps to the text area of the picture while suppress the response to non-text areas.The IASPP increases the adaptability of the network to different scales of text through changing the network's receptive field,which enabling the neural network to better extract the corresponding features of different scales of text.4.While training the model,Focal Loss is used to avoid the “hard-easy” and “positivenegative” imbalance problem.“Associative Embedding” is also adopted to assist the process of linking segments together,which can reduce false detection of the model.The experiment results show that the proposed algorithm' s running efficiency,detection performance,model simplicity and robustness are greatly improved compared with the previous natural scene text detection algorithm.
Keywords/Search Tags:Natural Scene Text Detection, Deep Convolutional Neural Network, Text/Non-text Image Classification, Script Identification
PDF Full Text Request
Related items