Font Size: a A A

Research And Application Of Text Detection And Recognition Technology In Industrial Scene

Posted on:2024-07-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y C ZhuFull Text:PDF
GTID:2568307103475504Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Scene text detection and recognition,as a hot research field in computer vision,has made significant progress,but its application in specific scenarios is still challenging.In the industrial intelligent storage scenario,industrial products on the binding production line are transported to the storage or shipping area after being boxed and labeled.In order to confirm product information,transportation workers often spend a lot of time checking product labels,and are prone to errors that lead to confusion in warehouse management.Therefore,the introduction of scene text detection and recognition technology to assist workers in verifying product information has become an urgent need.In industrial environments,label text images have special problems such as large amounts of information,long and straight text,and low resolution.How to quickly and accurately detect and recognize label text is the key to improving worker efficiency.In response to the above issues,this thesis proposes text detection and recognition methods in industrial scenarios,with the main contributions as follows:(1)For the text detection problem in industrial intelligent warehousing scenarios,a text detection model for industrial labels based on hierarchical-split feature enhancement and spatial attention is proposed.This model uses a hierarchical-split block to improve backbone,which combines the advantages of feature multiplexing and group convolution,and improves the inference speed and accuracy while ensuring a considerable amount of calculation.Secondly,a parallel feature extraction strategy is adopted,a lightweight feature enhancement pyramid is used to extract multi-scale information,and a upsampling operator named CARAFE is introduced to further improve performance.At the same time,the spatial attention mechanism is used to enhance semantic segmentation,and an adaptive scale fusion method is designed to unify the scale to reduce the amount of computation.Finally,the output of the two branches is fused,and the predicted text box is generated through the pixel aggregation algorithm.In addition,an improved Dice Loss function and a background loss function are introduced in the training stage to enhance the perception of the background.(2)For the text recognition problem,a text recognition model for industrial product labels based on super-resolution reconstruction and Vision Transformer(Vi T)is proposed.This model proposes a plug-and-play lightweight super-resolution reconstruction unit.Firstly,high and low dual-scale branches are generated in the shallow feature extraction module to reduce computational overhead.Secondly,a lightweight feature extraction module is designed,which achieves efficient reuse of information through a gradual fusion structure of skip connections and dense connections.At the same time,cross scale shared weight convolution is introduced,which reduces the amount of computation while enhancing the mutual learning between high and low scale branches.Then,a global residual connection is introduced to learn the high-frequency information difference between the original image and the highresolution image to generate a clear text image.In the text recognition unit,the parallel computing Vision Transformer is used for serialization modeling,and a fine-grained character-level addressing module is introduced to fully mine the sequence information in Vi T and enhance the attention to each character,thereby get the predicted text.Comparative experiments on public datasets show that the two models in this thesis outperform existing SOTA methods.In addition,in order to verify the effectiveness of the method in this thesis in industrial scenarios,an industrial dataset was constructed and experiments were carried out.The experimental results show that the model in this thesis has reached a balance in accuracy and real-time performance.
Keywords/Search Tags:industrial intelligent warehousing, text detection, text recognition, superresolution reconstruction
PDF Full Text Request
Related items