Font Size: a A A

Research On End-to-End Text Recognition In Document Images And Its Applications

Posted on:2021-05-04Degree:MasterType:Thesis
Country:ChinaCandidate:D Q TangFull Text:PDF
GTID:2428330647950751Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Document images such as receipts,certificates,and posters play an important role in people's daily lives.Effectively recognizing and extracting texts in such images helps people better understand the content of the images.At the same time,with the popularity of smart mobile phones,the demand for end-side document image analysis is increasing.For example,in application scenarios such as recognizing personal docu-ments or commercial contracts,people prefer to directly perform text recognition in the offline environment with the mobile device,so as to ensure information security and avoid leakage of private information.However,most deep learning based text recog-nition methods currently pay more attention to the design of neural networks,which is challenging to adapt to the demand of real-time operation on the mobile device.In view of the above difficulties and challenges,this paper deeply studies the real-time document analysis and recognition problem suitable for the mobile device,and proposes a novel differentiable binarization based end-to-end text recognition method for document images.On this basis,aiming at the specific application scenario of ticket identification,an efficient weakly supervised structured recognition method is further proposed.The specific work of this paper is as follows:1.For the problem that current deep learning based text recognition methods can-not meet the needs of real-time operation on the mobile devices,this paper integrates the text detection and the text recognition into the same neural network by feature sharing,and uses a lightweight backbone network and feature fusion method to improve the efficiency of the neural network.Besides,the long time-consuming post-processing of text detection is simplified as much as possible.The binarization threshold of se-mantic segmentation is optimized as a differentiable variable in the neural network.Therefore,only simple but effective binary operation is required in post-processing to accurately segment the text areas from the image,and separate different text instances as well.In order to improve the detection performance of dense and long texts in the document images,this paper designs an end-to-end recognition method based on the idea of segmentation,which is able to model the entire text line through the local re-ceptive field,avoiding the incomplete coverage due to insufficient receptive field.Ex-periments on several benchmark datasets show the effectiveness and efficiency of the proposed end-to-end recognition method.For example,on the scanned receipt dataset SROIE,the F1 value and speed are improved by 3.6%and 2.9fps(frames per sec-ond)respectively compared with the end-to-end text recognition method FOTS,and improved by 22.9%and 9.7fps respectively compared with the two-stage recognition method CTPN-CRNN.2.For particular document analysis and recognition scenarios such as tickets,it faces two challenges:scarcity of real samples and difficulty in extracting key informa-tion.Therefore,this paper designs a weakly supervised structured recognition method for such images:(1)Based on the idea of style transfer,a ticket-type data augmenta-tion scheme is proposed as a preprocessing step to generate large-scale realistic training data;(2)In order to parse out the key information(such as the passenger,start station,arrival station,etc.in the train ticket)from the ticket image and output with a struc-tured form,an extremely low computational perception module is proposed to replace the explicit text detection operation in the previous document recognition method.This module adaptively finds the regions of interest in the ticket images.After global aver-age pooling,each field will correspond to a specific feature sequence,and followed by a recognition module to output the structured results containing each field directly,avoid-ing the cumbersome and complicated analysis process.Experiments on the benchmark datasets show that the proposed structured recognition method achieves state-of-the-art in both performance and efficiency.For instance,on the train ticket dataset,the accu-racy and speed are improved by 6.8%and 12.3fps compared with PixelLink-CRNN.After using style transfer to augment the training data,the accuracy of the proposed recognition method further improves by nearly 10%on the train ticket dataset.
Keywords/Search Tags:Document Analysis, Text Detection, Text Recognition, Structured Recog-nition, Data Augmentation
PDF Full Text Request
Related items