Font Size: a A A

Research On Deep Learning Based Scene Text Understanding

Posted on:2020-12-14Degree:MasterType:Thesis
Country:ChinaCandidate:J LuFull Text:PDF
GTID:2428330623951441Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Words are the wealth created by mankind and the symbol of human social civilization.Since ancient times,words have playing a vital role in people's daily lives.The text contains abundant information.With the development of big data and artificial intelligence,human beings increasingly require to understand the text information from a large number of pictures.Therefore,text comprehension becomes more and more important.In natural scenes,text comprehension tasks are generally composed of text detection and text recognition.In recent years,many achievements have been made in the research in this field,which has greatly promoted the progress of research.However,there are still many challenges such as noise,blur and distortion in the text understanding of natural scene images.Therefore,this topic has carried out in-depth research on the existing problems.In response to the detection and recognition tasks in the text understanding of natural scenes,we have achieved the following innovations in combination with the characteristics of natural scene texts and the latest deep learning techniques.Scene text detection tasks,which aim to detect the position of the text in the picture,are crucial for computer vision and have been gained increasing attention.Many existing works treat all the pixel with the same importance.However,these methods ignore the fact that the pixel near the border have more importance for predicting the bounding boxes.In addition,we can easily determine the position of the entire instance when instance boundaries is marked.Inspired by human annotation,in this paper,we propose a location sensitive deep network model(LSDN for short),which focuses on scene text detection.More specifically,when the model predicts text/non-text for words and lines of text,we guide the edge position information,make the model pay more attention to the indistinguishable edges,and ignore the less important internal position information.What's more,our LSDN can speed up model convergence and improve the accuarcy of text detection.The experimental results demonstrate that our LSDN model has competitive performance both in accuracy and efficiency.Especially,our model achieves state-of-the-art results on MSRA-TD500 and ICDAR2015.Natural scene text recognition is a technique for automatically recognizingcharacters by positioning a detection task in a text area.This paper analyzes the characteristics of the capsule network and finds that it can effectively extract the location and serialization information of the feature.This paper proposes a Caps recognition algorithm based on capsule network for text recognition algorithm,which is divided into CapsNet image coding module and CTC text decoding module.First,the image coding module extracts the sequence feature vector of the image by using the spatial variability of the capsule network and the capsule vector feature,and then obtains the predicted tag sequence through the fully connected layer FC,and finally the CTC transcription layer decodes the predicted sequence into a text tag sequence to implement text recognition.The Caps recognition algorithm proposed in this paper tests on synthetic datasets and generated outdoor images.The experimental results show that the proposed method has strong competitiveness and verifies the advantages of capsule network in text sequence tasks.
Keywords/Search Tags:Deep learning, convolutional neural network, text detection, text recognition, capsule network
PDF Full Text Request
Related items