| In modern society,text has become an indispensable part of human life,appearing in various scenarios such as written texts in books,road signs,and textual elements in films and literature.Text plays a crucial role in our lives as it serves as a carrier of cultural information,a means of communication for exchanging thoughts,and a bridge for the transmission of civilization.It exists in various forms in our lives,including as images or videos.With the rapid development of information technology and the internet,there is a growing demand for the analysis and processing of text in images and videos using computer technology.Therefore,scene text analysis has become an important research topic in the field of computer vision.Scene text analysis,also known as optical character recognition(OCR),mainly includes two subtasks: text detection and text recognition.This paper focuses on research related to deep learning-based scene text detection and recognition,studies existing text detection and recognition algorithms,proposes its own improvements and innovations based on this foundation,and enhances the algorithm’s performance.Finally,the text detection and recognition algorithms are applied to financial table restoration,achieving practical applications of the algorithm.The specific contents are as follows:(1)To address the problem of complex and diverse text backgrounds,as well as varying sizes and shapes of text in natural scenes,a new segmentation-based scene text detection network is proposed.This network improves its performance by constructing two modules:multi-scale pooling and bidirectional feature fusion.Based on the characteristics of text instances,the multi-scale pooling module uses spatial pooling with different aspect ratios to capture the dependency relationships of text information at different distances,which guides the network to obtain more accurate segmentation results.The bidirectional feature fusion module constructs two fusion paths in different directions to better utilize the different scale features of the backbone network and improve the network’s detection performance for text of varying sizes.This method achieves competitive results on three public datasets,demonstrating its superiority.(2)This article improves the SVTR text recognition algorithm and enhances the performance of the text recognition model using self-supervised and semi-supervised methods.In SVTR,Local Mixing only uses a single-sized window to extract local features of characters,which limits its modeling ability for various character components.This article equips multiplesized windows for the same Local Mixing to enable it to model the relationship between various character components at different scales.At the same time,a better performance Attentionbased decoder is used to replace the CTC decoder in the original SVTR,improving the accuracy of the recognition model.To further improve the model’s performance using unlabeled data,this article improves the Sim Siam self-supervised method and applies it to the field of text recognition,and adds the Pseudo-Label semi-supervised method in the training process.The improved recognition algorithm has achieved competitive results on multiple public datasets,demonstrating the effectiveness of the method.(3)The text detection and recognition algorithm is applied to the financial field’s table restoration,using real business scenarios,to achieve the function of converting non-editable PDF files into editable Excel tables.The algorithm process includes data preprocessing,text detection and recognition,and table structure restoration.Through pre-training and fine-tuning,this article improved the performance of text detection and recognition algorithms in this scenario.The algorithm was then engineered and deployed as a web project to provide services externally,achieving the practical application of the algorithm. |