Research Of Scene Text Recognition Based On Encoder-decoder Architecture

Posted on:2022-11-21

Degree:Master

Type:Thesis

Country:China

Candidate:X C Du

Full Text:PDF

GTID:2518306752454134

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the development of big data and deep learning,text image recognition has an important impact on people's daily lives.This paper focuses on the text recognition task and improves the text image recognition model based on encoder-decoder.Specifically,this paper adopt the feature extraction module based on the attention mechanism and the temporal convolutional network to extract visual features and the modeling of the feature sequence in the encoding stage;the multi-layer feature aggregation mechanism is used to aggregate different levels of information;heuristic local attention mechanism is adopt to decode character sequence in the decoding stage.The experiment proves the proposed model in this paper has more superior performance.Firstly,the visual features of text images play an necessary role in the STR.There-fore,this paper extracts the visual features by employing channel and spatial attention-based feature extraction module.Channel and the spatial attention module enhance the feature at the channel and spatial level respectively.Extensive evaluations have proved that the channel and spatial attention-based feature extraction module can obtain more robust features,which is beneficial to improve the performance of the model.Secondly,this paper adopts Temporal Convolutional Network(TCN)to model the feature sequence.Compared with RNN,TCN can not only process sequence features in parallel,but also deal with the disappearance of information gradients and explosions through the residual structure.The parameters of the TCN in each layer are shared and without saving the information of each time step.More importantly,TCN has more flexible receptive field,and the different number of layers,convolution kernel size and expansion coefficient can be designed according to different scenarios.Thirdly,the multi-level aggregation mechanism is proposed to extend the stan-dard encoder-decoder-based architecture by capturing visual feature of different levels.The standard architecture only uses the deepest visual features for sequence modeling which leads to feature vectors degenerating due to the ever expanding receptive field.Therefore,the multi-level aggregation mechanism proposed in this paper aggregates the visual features of different layers to improve the performance of the model.Finally,a decoder based on heuristic local attention mechanism is applied to decode character sequence.For scene text recognition,it is important to obtain the most relevant features of the character at the current time.Therefore,this paper explores a variety of existing local attention methods and provide complete comparison results.In addition,inspired by the existing local attention mechanism,this paper introduced two heuristic-based local attention mechanisms.Extensive experiments show that the heuristic-based monotonous local attention mechanism achieves the best results.

Keywords/Search Tags:

Scene Text Recognition, Encoder-Decoder, Channel-Spatial Attention, Temporal Convolution, Feature Aggregation, Heuristic Local Mechanism

PDF Full Text Request

Related items

1	Study On Human Action Recognition Based On Non-local Spatial-temporal Residual Attention Mechanism
2	Research On Encoder-Decoder Model For Complex Structure Text Recognition
3	Research On End-to-end Scene Text Recognition Method Based On Deep Learning
4	Research On Encoder-Decoder Based Two-Dimensional Structural Text Recognition
5	Research On Scene Text Detection Algorithm Combining Dual Attention Mechanism And Dilated Convolution
6	Research On Feature Fusion Strategies Of Attention Mechanism In Image Description
7	Irregular Scene Text Recognition
8	Research On Scene Text Detection And Recognition Based On Deep Learning
9	Video Action Recognition Based On 2D Convolution Network Under Spatio-Temporal Feature Enhancement Mechanism
10	Encoder-decoder Model For Multi-aspect Sentiment Classification