Font Size: a A A

Research On Scene Text Detection Method Based On Deep Learning

Posted on:2021-05-13Degree:MasterType:Thesis
Country:ChinaCandidate:B P XuFull Text:PDF
GTID:2428330620965697Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the popularization of electronic devices and the development of Internet technology,more and more information is transmitted in the form of images in people's daily life.Images contain rich information,in which text is a kind of information that plays an important role in image-understanding.Accurate detection of text in images helps to recognize words and understand images.Thanks to the rapid development of deep learning,the technology of detecting text in scene images has also made considerable progress.Many scene text detection algorithms based on deep learning can effectively detect text in scene images,but these effective methods are based on large-scale neural network for feature extraction,so the models of these methods are often very large and have a lot of parameters and are slow to detect.Scene text detection is an application-oriented technology.Practical application scenarios usually require not only that the model can detect text accurately and effectively,but also that the size and operation efficiency of the model are high.Traditional large scene text detection models often can not meet the needs of practical applications.How to design a miniaturized text detection model and detect scene text efficiently has become an important research content.In recent years,the task of scene text detection has been the focus of relevant researchers,and has become one of the hot areas in the field of image processing.On the one hand,scene text detection has great research and application value,and its application has great potential in auto-driving and augmented reality.On the other hand,there are still many challenges on the way to solve the detection of scene text.The distribution of scene text is characterized by randomness,diversity,irregularity,and so on.It is difficult to detect scene text accurately.Although the scene text detection method based on deep learning is effective,it is often too large to be applied to actual production and life scenarios.In this thesis,after investigating many scene text detection models based on deep-learning,two scene text detection models based on deep learning are proposed according to the characteristics of scene text and the requirements of practical scene text detection.The main work and innovations of this thesis are listed as follows:1.Considering that scene text has the characteristics of multi-direction,multi-scale,indeterminate shape and random location,it is difficult to detect it accurately with regular quadrilateral frames.Accurate detection is essential for further text recognition.In order to detect scene text efficiently and accurately,a scene text detection method based on Dual-Path feature fusion scene text detection model(DPFF)is proposed.This method uses a lightweight neural network,EfficientNet-b3,to extract features,and uses two branches for feature fusion to detect scene text.One branch uses a feature pyramid network structure to fuse the features of different hierarchies;the other uses a atrous spatial pyramid poolingstructure to expand the field of perception,and then fuses the two branch's feature maps,which allows more features to be acquired at the same time as a small increase in computational effort to compensate for the insufficiency of small network feature extraction.Finally,a progressive expansion algorithm is used to process the segmented graph to get the final detection result.Experiments on three open datasets demonstrate that DPFF model not only can detect multiple scene texts effectively,but also has the advantage of smaller and faster models.2.In view of the character that most of the text appearing in natural scenes is more regular and multi-directional,this thesis presents a scene text detection model Light-EAST based on EAST model.Light-EAST uses the lightweight network MoGA-A as the backbone network to extract features,then builds two feature pyramid networks from top to bottom and from bottom to top.The two feature pyramid networks run in parallel and then merge them together.Feature pyramids built from top to bottom are more sensitive to small objects,and feature pyramids built from bottom to top are more sensitive to large objects.Fusing the two feature pyramids can complement each other,making the model detect scene text with irregular scales better.Finally,the model uses the fused features to predict the coordinates of the vertices of the text box,and then uses the Non-Maximum Suppression(NMS)algorithm to filter out the text box with higher scores to obtain the final accurate text detection box.The experimental results show that the Light-EAST model can detect multi-directional scene text efficiently.
Keywords/Search Tags:deep learning, feature fusion, text detection, natural scene images, lightweight neural network
PDF Full Text Request
Related items