Text Image Recognition Base On Diverse Data And Weak-Supervised Learning | Posted on:2023-10-05 | Degree:Doctor | Type:Dissertation | Country:China | Candidate:C J Luo | Full Text:PDF | GTID:1528306830481994 | Subject:Information and Communication Engineering | Abstract/Summary: | PDF Full Text Request | Text image recognition is an essential issue in computer vision tasks.Many practical applications such as intelligent traffic,product recognition,and image inspection,benefit from the rich semantic text information.Therefore,text image recognition has emerged at the forefront of this research topic and is regarded as an open and challenging research problem.Nowadays,models for regularly printed text recognition have achieved notable success.Nevertheless,most current recognition models remain unstable to handle multiple disturbances from the environment,such as various shapes of irregular texts and complex noise of backgrounds.Simultaneously,recognizing and processing a plethora of distinct individual handwriting styles remains a great challenge.Recently,deep learning-based data-driven approaches become dominant.Improving the performance of these approaches typically requires collecting and annotating large-scale text images for model training.However,it is quite time-consuming and labor-intensive.This thesis studies text recognition from a novel perspective of making better use of largescale data.We increase the data diversity by performing data augmentation and data synthesis to improve the robustness of recognition models.We decrease the dependencies of full annotations by using data in an adversarial-/weak-/self-supervised manner to make full use of data.The research of this paper is carried out in the following three aspects:(1)We tackle the problem of insufficient diversity of training samples by proposing a smart data augmentation approach for more effective and specific training data.Moreover,we tackle the problem of numerous handwriting styles by proposing a handwriting synthesis approach.By adjusting style parameters and content conditions,we can synthesize high-quality handwritten text images with diverse styles and rich vocabularies.Experiments show that data augmentation and synthesis significantly enrich the training samples and improve the robustness of the recognition model.(2)We tackle the problem of insufficient model generalization performance in the wild by focusing on irregular shapes and complex background noise.We propose a weak-supervised multi-object rectification model and a self-aligned adversarial denoising model.They are j ointly trained with recognition models,using only text labels as the supervision to rectify irregular shapes and remove background noise.which significantly reduces the difficulty of text image recognition and improves the performance of the recognition model.(3)We tackle the difficulty of using large-scale unlabeled data by utilizing the unique property of text images and rethinking the solution to the issue from a novel perspective,rather than directly adopting mainstream contrastive learning approaches.Typically,the neighboring image patches among one text line tend to have similar styles,including the strokes,textures,colors,etc.Motivated by this observation,we propose a self-supervised representation learning scheme using similarity-aware normalization.We make use of the correlation among one text line to recover an augmented patch by using its neighboring patch as guidance.The decoupling and ensemble of content and style improve the representation quality.Moreover,the self-supervised generative model achieves encouraging performance on extended tasks such as data synthesis,text image editing and font interpolation,suggesting a wide range of practical applications.This paper proposes several approaches and ideas for text image recognition in the era of big data.We hope these approaches could arouse the rethinking of the use of data in the field of text recognition. | Keywords/Search Tags: | Deep learning, optical character recognition, data augmentation, data synthesis, background noise removal, representation learning, generative adversarial network, weak-supervised, self-supervised, artificial intelligence | PDF Full Text Request | Related items |
| |
|