Font Size: a A A

End-to-End Scene Text Recognition

Posted on:2020-07-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y ChenFull Text:PDF
GTID:2428330575955155Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
End-to-end scene text recognition is an important task in the field of computer vision.This task focuses on detecting text regions and recognizing words from these regions.A traditional representative method can be split into two parts,which are text detection part and text recognition part.In such procedure,text locations are obtained by text detection part.Then,areas are cropped according to those locations and fed into text recognition part to get recognition results.In recent years,there is a class of works that combines two parts together through an end-to-end network.In these works,two tasks can be trained in an end-to-end way through a shared feature layer.However,there are many problems in existing end-to-end text recognition networks.In this thesis,we propose new deep networks to solve these problems.Firstly,existing end-to-end text recognition networks use the same shared feature map to connect detection and recognition parts.However,the connection process does not take into account that feature characteristics required by these two parts are dif-ferent.We introduce a feature filtering mechanism based on gate structure to existing end-to-end text recognition network,trying to select features for recognition part adap-tively.Experiments show that the training procedure of our network is more stable.Compared with state-of-the-art structures,our network also achieves better accuracy.Secondly,features fed into recognition part are rotation-sensitive in existing end-to-end network.It means the input text regions with different rotation angles will lead to different features of the recognition part,which is harmful to the training procedure.In this paper,we study the effect of rotation-sensitive features on end-to-end text recog-nition networks.We design a new feature mapping structure to get rotation-invariant features for recognition part.Experiments show that this structure is more suitable for processing text regions with different rotation angles.Our network also achieves better accuracy compared with networks which use rotation-sensitive recognition features.Finally,in existing end-to-end network,the boundaries of text regions predicted by detection part is usually inaccurate,which lead to a bad performance of recognition part.A novel algorithm based on nearest-neighbor correlation is designed to refine the detection boundaries.To the best of our knowledge,this thesis is the first to introduce the idea of nearest-neighbor correlation into end-to-end text recognition framework.Experiments show such algorithm can improve the recognition accuracy,especially in none-lexicon situation.Our algorithm can achieve better end-to-end recognition accuracy compared with existing networks.
Keywords/Search Tags:Computer Vision, Scene Text Detection, Scene Text Recognition, End-to-End Scene Text Recognition
PDF Full Text Request
Related items