Font Size: a A A

Text Detection In Complex Scene Based On Multi-scale Information Preservation

Posted on:2021-04-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2428330620963104Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Text widely exists in natural scene images.Compared with other objects in the image(such as flowers,buildings,etc.),the text information in natural scene images has strong logic and rich expression,which can effectively provide high-level semantic information.Effective and automatic processing of text information in natural scene images is of great significance for the researches,such as: improving the level of industrial automation,network retrieval capabilities,and scene analysis capabilities.Text is an important element in our understanding of natural scenes.Text detection in natural scenes is currently used to solve many practical visual problems.Therefore,studying text detection in natural scenes has great practical application value.However,the text in natural scenes is affected by objective factors such as the image shooting angle,light,etc.,and also causes great difficulties in text detection because of the arrangement of text.Compared with the traditional(Optical Character Recognition OCR)method,the popular deep learning method has obtained more excellent detection results,but most of the current deep learning methods are models directly cited from the target detection field,and are targeted at text information Not strong,detailed information is easy to be lost in the tandem convolution operation,leading to false detection and missed detection.Therefore,detecting text from natural scene images is still a very challenging task.Based on the deep learning algorithm,my work focused on how to preserve detailed information in text detection as much as possible,as follows:(1)An end-to-end detection method for complex scene text based on attention mechanism is proposed.Inspired by the human visual attention mechanism,we introduce a visual attention layer in the VGG16 basic network structure,so that the network can distinguish the importance of different levels of features,imitating humans to quickly locate the target of interest(text)from complex scenes,and Give priority to these important areas.At the same time,the ideal insertion position of the visual attention layer is determined through experiments.This module strengthens the sensitivity of the network to text areas,solves the problem that the general network structure cannot focus on important features in text detection,and protects the detailed information about text to the greatest extent.In addition,we also use the local perception non-maximum value to suppress the position of the precise text box,increase the running speed,and reduce the amount of calculation.Experiments show that the method we proposed reduces false detections and missed detections,and the recall rate and precision rate have been significantly improved.(2)An attention mechanism network model combining local and non-local feature information is proposed for text detection in complex scenarios.The general network structure generally only obtains local and global feature information through repeated convolution operations,and does not strengthen the differentiation to deal with the features of different positions and different importance.Unlike the general top-down tandem network structure,we extracted low-level feature information and high-level feature information in parallel on the basis of the original convolution operation,and used different calculation mechanisms for these two types of information,and designed Different levels of local and non-local feature fusion strategies allow the important features of each layer to be effectively strengthened.In the experimental part,we can see that our network can correctly detect text areas in complex scenes,reducing false detections and missed detections.In this thesis,we systematically studies the problem of loss of detail in text area detection in text detection in complex scenes,and explores scientific issues such as the construction of text detection models,visual attention,and the relationship between different convolutional layer features.The ideas we work on could provide new research view for the subsequent research on scene text detection and related application issues.
Keywords/Search Tags:Attention mechanism, text detection, convolutional neural network
PDF Full Text Request
Related items