With the vigorous development of multimedia technology and the popularization of electronic devices,the number of text images obtained by scanners,mobile phones,cameras,surveillance,driving recorders and other devices has ushered in an explosive growth.How to accurately recognize the text in images has become an important research topic.The technology of the text recognition mainly recognizes scanned document images and scene text images.In recent years,the technology of scene text recognition based on deep learning has reached great progress,which can deal with the recognition of scanned documents and scene text at the same time.However,in scenes,such as reflective traffic signs,worn pavement markings,stained container numbers,documents covered by seals,etc.,incomplete texts whose structures are incomplete exist widely,and they can’t be accurately recognized by the existing text recognition technologies,which may lead to huge economic losses and safety problems.At present,image inpainting technology based on deep learning can accurately reconstruct incomplete images with complex structures and large damaged areas by combining visual features and semantic features.Theoretically,using image inpainting technology to reconstruct incomplete text into complete text,and then using text recognition technology to recognize complete text,can solve the problem that incomplete text cannot be correctly recognized.But,because the existing image inpainting technology reconstructs the foreground and background of the image at the same time,the reconstruction of the background affects the text structure,which eventually leads to an error in the text structure reconstruction.Therefore,this thesis mainly studies image inpainting techniques suitable for incomplete text.At the same time,in order to make up for the lack of incomplete text dataset,this thesis also build a dataset of incomplete text via text synthesis.The specific works of this thesis are as follows:1.Aiming at the problem of missing incomplete text dataset,this thesis presents an incomplete text image synthesis method based on the related work on complete text image synthesis,and uses this method to build an incomplete text dataset(SITD),which contains rich incomplete text images.This method uses the convex hull based on circle and rectangle to generate random polygons to approximate the real shape of the incomplete area,and at the same time fills the incomplete areas by constant pixel values and the underlying background,which greatly enriches the incomplete text dataset from form to content.2.Aiming at the problem that incomplete text cannot be recognized correctly,this thesis proposes an Inpainting Network for Incomplete Text(INIT).INIT abandons the idea of the traditional image inpainting algorithm that reconstructs the background and foreground at the same time,and gradually separates the background and text structure in the inpainting process.Therefore,the algorithm only focuses on the reconstruction of the text structure,and reduces the impact of background inpainting on the reconstruction of the text structure.INIT is jointly supervised by reconstruction loss and semantic loss during training,which makes it capable of semantic reasoning and further improving the reconstruction ability of incomplete text.At the same time,the incomplete text after INIT reconstruction can be correctly recognized by using advanced text recognition technology.(The related paper of this work has been accepted by the international conference ISCAS2022,which is CCF C category.)3.In order to further improve the ability of incomplete text reconstruction,this thesis also proposes a Two-stage Inpainting Network for Incomplete Text(TSINIT).TSINIT is explicitly divided into two modules,i.e.,text extraction module and text reconstruction module,which reduces the probability of inaccurate feature representation and reconstruction errors due to interleaved execution of tasks.In the initial stage of training,the two modules are trained separately so that each module has independent functional properties.In the second stage,the two modules are jointly trained,which breaks the isolated feature representation state,enhances the information exchange between the two modules,improves the reconstruction ability of incomplete text,and further improves the accuracy of subsequent text recognition.(The related paper of this work has been submitted to the SCI journal TMM,and is currently in the modification stage.)4.This thesis integrates the above proposed incomplete text inpainting algorithm with text detection algorithm,text recognition algorithm,and image classification algorithm,and designs and implements a text recognition system that can deal with both complete text and incomplete text recognition tasks.The system pre-defines three recognition modules according to actual business scenarios,which are incomplete text recognition module,mixed text recognition module,and complete text recognition module.Users can select the corresponding text recognition module according to the needs of different scenarios,and can also modify the text detection results and text recognition results,which greatly improves the interactive performance. |