Font Size: a A A

An End-to-end Multi-angle Scene Text Detection And Recognition Method

Posted on:2021-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:P ChenFull Text:PDF
GTID:2428330602476864Subject:Control engineering
Abstract/Summary:PDF Full Text Request
The text in the natural scene shows a lot of information and it provides basic tools for interacting with the environment.In terms of text detection,scene text detection is influenced by large differences in scene text ratio,aspect ratio,and orientation.In this paper,feature pyramid mechanism(Feature Pyramid Networks,FPN)and SSD(Single Shot Detector)framework are combined to process text of different proportions,and connect local detectable elements to detect texts with different directions and aspect ratios.Compared with SSD,the deep feature map is enlarged to better locate large text and accurately identify small text.In terms of text recognition,the recognition module with the residual module(ReseNet)and the attention mechanism(Attention)solves the problem of gradient explosion and disappearance during model training,it can also effectively predict long characters and improve recognition rate.In order to deal with texts with different proportions,aspect ratios and directions,an end-to-end scene text detection and recognition method is proposed.The work of this paper is as follows:(1)By combining feature pyramid networks and connecting segments,scene text with different proportions and directions can be effectively detected.(2)The combination of deeper feature pyramid mechanism and SSD design can effectively solve the problem of text detection of different scales,especially small texts.(3)Since the SSD style detector is selected,the proposed text detection method is very efficient.(4)Use deep bidirectional recurrent network(Bi-LSTM)that introduces residual network to encode text sequence features,and use the output as a series of text suggestions.Finally,the text recognition is completed by decoding the introduction of attention mechanism to connect the time classification loss(CTC).By adding a residual module to the classic deep bidirectional recursive network,the convergence speed of the network is accelerated and the difficulty of network training is reduced.By adding an attention mechanism to the connection time classification loss,the system attaches more importance to the relevant parts of the input than the irrelevant parts,avoiding additional alignment preprocessing and subsequent grammatical processing of the labels,and the weighting of different sequences in the current text recognition Distribution,thereby improving the recognition rate.The proposed method is applied to the classic text detection and recognition data sets ICDAR2013 and ICDAR2015,and the verification experiments are carried out.The evaluation results shows the recognition accuracy of the method proposed in this paper is above 90%on average,and it has a good robustness for multi-angle,different scale and ratio,it is a further exploration of multi-angle text recognition research and a beneficial extension of the application of scene text recognition.
Keywords/Search Tags:Feature Pyramid Networks, Multi-angle, Bi-directional Long Short-Term Memory, ResNet, Attention, Text recognition
PDF Full Text Request
Related items