Research On Real-time Detection And Recognition Of Dense Text In Natural Scenes

Posted on:2024-08-02

Degree:Master

Type:Thesis

Country:China

Candidate:N Y He

Full Text:PDF

GTID:2568307079955439

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

Text in natural scenes is a common source of information that plays an important role in assisting computer vision systems in scene understanding.Using scene text detection and recognition technology to obtain text information from images and videos has become a research hotspot in the fields of computer vision and document analysis.Many research results have been widely applied in auto-pilot,scene parsing,image retrieval and many other fields.Many state-of-the-art text detection and recognition methods are optimized for publicly available datasets with sparse and limited Chinese text instances.In the dense Chinese text scene,accuracy and latency of the deep learning based text detection and recognition methods still needs to be optimized.Based on the Chinese text detection dataset of reading scenes and the synthetic Chinese text recognition dataset,this thesis optimizes the performance of text detection and recognition algorithms in dense text scenes.The specific work is as follows:(1)Existing text detection methods are weak in extracting features of dense text objects.To solve this issue,this thesis proposes a multi-scale feature based text detection method.It designs a multi-scale feature fusion module and a grouping spatial attention feature enhancement module,which are used to reduce information loss during feature sampling,enhance important multi-scale features and suppress noise.This method makes full use of high-level semantic features and low-level detail features,thereby enhancing the model’s feature representation ability and effectively improving the detection accuracy of dense text objects.(2)Due to the poor real-time performance of the multi-scale features based text detection method,this thesis proposes model compression methods based on lightweight structure design and structured pruning.Firstly,this thesis proposes a lightweight text detection model by structure designing,manually reducing the model’s parameter and computational complexity.Then,this thesis proposes a channel attention module to calculate the channel weights of the feature map as a criterion for pruning convolution kernels.Finally,this thesis compresses the lightweight text detection model by structured pruning,and obtains a scene text detector that balances accuracy and speed.(3)In typical text recognition framework,the sequence modeling operations may lead to the loss of high-level semantic structure information.To solve this problem,this thesis proposes a shifted windows multi-head self-attention based text recognition model.This model removes the convolutional neural network in the classic framework,uses Transformer to extract spatial features and perform sequence modeling.Finally,a CTC decoder is applied to align the output prediction sequence in this model.This model simplifies the typical framework of text recognition methods and avoids the information loss caused by modeling feature maps into feature sequences.

Keywords/Search Tags:

Text detection, Text recognition, Multi-scale feature, Attention mechanism, Structured pruning

PDF Full Text Request

Related items

1	Research And Application Of Natural Scene Text Detection Algorithm Based On Deep Learning
2	Scene Text Detection And Recognition Based On Deep Learning
3	Research And Implementation Of Scene Text Detection And Recognition Based On Channel Attention Mechanism
4	Research And Application Of Text Recognition Algorithm In Complex Scenes
5	Research On Scene Text Detection And Recognition Based On Deep Learning
6	Research On OCR Detection And Recognition Technology Based On Deep Learning
7	A Bad Text Recognition Based On Multi-feature Graph Convolutional Embedding
8	Research On Scene Text Detection Method Based On Deep Neural Network
9	Text Detection In Complex Scene Based On Multi-scale Information Preservation
10	Irregular Text Recognition From Complex Scenes