Research On Semantics Independence Deep Learning Methods For Scene Text Recognition

Posted on:2023-02-24

Degree:Master

Type:Thesis

Country:China

Candidate:B Wang

Full Text:PDF

GTID:2568306614484504

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

In recent years,with the rapid development of Internet and mobile technologies,the amount of information stored on computers has exploded exponentially.As a carrier of information,the number of scene text images is also growing rapidly.Information is stored and utilized by computers in natural scenes through two-dimensional images based on red,green and blue.Automatic recognition of text information in images by computers has a wide range of applications,such as automatic driving,bill recognition,human-computer interaction,etc.In recent years,the models which have achieved the best results are mostly based on vision and semantics.These methods usually begin by using a feature extractor to extract visual features from two-dimensional images.Then the semantics model(also known as the language model)is used to encode the feature maps of the previous step to obtain the semantic features.Finally,the visual and semantic features are used to obtain the final recognition result.In this way,semantics models are highly dependent on visual features.This method of coupling semantic features with visual features has two disadvantages:Firstly,semantics models become correctors for visual models,which are only used to correct the results obtained by vision models.Secondly,using semantics models to correct the results of vision model can greatly improve the accuracy.However,there is a wide range of wrong text information in natural scene applications,such as handwritten text recognition and marking.For incorrect text,the model recognizes and automatically corrects it,which is a departure from the purpose of our mission.Moreover,due to the serial coupling of vision and semantics model,the model is bloated and difficult to train.In order to solve the above problems,a novel Semantics Independence Network is proposed in this thesis,which separates semantics modules.The semantics module is separated and becomes the equivalent part of the vision model,so that the vision model pays more attention to the two-dimensional visual features and the semantics model pays more attention to the one-dimensional semantic features.In addition,a vision semantics fusion module is proposed to fully interact visual features with semantic features.Through the above two ways,the semantics module can process semantic information independently.Vision and semantics module can be fully decoupled,and the features of the two parts can be fully utilized.In addition,a pruning method for analyzing the parameter redundancy of the scene text recognition model is proposed for the first time.It provides a way to check whether a module should be used when designing a scene text recognition network.After pruning the trained model by modules,the number of parameters is effectively reduced and the function of each module is verified.A redundant parameter pruning method is proposed and layer-aware pruning rate setting is introduced.Through the post-pruning of the proposed semantics independence scene text recognition method,the validity of the proposed semantics module and fusion module and the redundancy of Transformer network parameters are analyzed.The main contributions of this thesis are summarized as follows:(1)A text recognition network based on semantics independence is proposed,which is different from the previous model that decoupled vision model and semantics model by truncating gradient.Instead,it adjusts the structure of the model to achieve complete decoupled structure.(2)A new visual features and semantic features fusion module is designed,which makes the visual features and semantic features fully interact and make full use of the visual features and semantic features.(3)A new pruning method for redundant parameters is designed and is applied to the text recognition network.It prunes each module of the text method which we proposed above and analyzes the redundancy of each module.(4)A layer-aware pruning rate setting is introduced into the pruning method mentioned above.By considering the difference between different layers to be cut into the pruning method,different layers are pruned differently.

Keywords/Search Tags:

Scene Text Recognition, Arbitrary-Shaped Text, Semantics, Decouple, Semantics Independence, Pruning

PDF Full Text Request

Related items

1	A Research On Text Vector Representation Based On Semantics
2	Reasearch On Video Text Information Extraction Based On Features Integration
3	Research On Arbitrary-shaped Scene Text Detection Algorithm Based On Deep Learning
4	Research On Key Technologies Of Multi-oriented And Arbitrary-shaped Scene Text Detection
5	Research On Text Similarity Detection Algorithm Based On Human-Computer Interaction Semantics
6	Research On Deep-Learning-Based Scene Text Detection And End-to-End Recognition
7	Research On Natural Scene Text Detection Method Based On Instance Segmentation
8	Semantic Recognition Of Individual Activities Based On Social Media Data
9	Research On Scene Text Detection
10	A Research On Irregular-Shaped Text Detection Algorithm In Natural Scene Images