Research On End-to-End Text Recognition In Document Images And Its Applications

Posted on:2021-05-04

Degree:Master

Type:Thesis

Country:China

Candidate:D Q Tang

Full Text:PDF

GTID:2428330647950751

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Document images such as receipts,certificates,and posters play an important role in people's daily lives.Effectively recognizing and extracting texts in such images helps people better understand the content of the images.At the same time,with the popularity of smart mobile phones,the demand for end-side document image analysis is increasing.For example,in application scenarios such as recognizing personal docu-ments or commercial contracts,people prefer to directly perform text recognition in the offline environment with the mobile device,so as to ensure information security and avoid leakage of private information.However,most deep learning based text recog-nition methods currently pay more attention to the design of neural networks,which is challenging to adapt to the demand of real-time operation on the mobile device.In view of the above difficulties and challenges,this paper deeply studies the real-time document analysis and recognition problem suitable for the mobile device,and proposes a novel differentiable binarization based end-to-end text recognition method for document images.On this basis,aiming at the specific application scenario of ticket identification,an efficient weakly supervised structured recognition method is further proposed.The specific work of this paper is as follows:1.For the problem that current deep learning based text recognition methods can-not meet the needs of real-time operation on the mobile devices,this paper integrates the text detection and the text recognition into the same neural network by feature sharing,and uses a lightweight backbone network and feature fusion method to improve the efficiency of the neural network.Besides,the long time-consuming post-processing of text detection is simplified as much as possible.The binarization threshold of se-mantic segmentation is optimized as a differentiable variable in the neural network.Therefore,only simple but effective binary operation is required in post-processing to accurately segment the text areas from the image,and separate different text instances as well.In order to improve the detection performance of dense and long texts in the document images,this paper designs an end-to-end recognition method based on the idea of segmentation,which is able to model the entire text line through the local re-ceptive field,avoiding the incomplete coverage due to insufficient receptive field.Ex-periments on several benchmark datasets show the effectiveness and efficiency of the proposed end-to-end recognition method.For example,on the scanned receipt dataset SROIE,the F1 value and speed are improved by 3.6%and 2.9fps(frames per sec-ond)respectively compared with the end-to-end text recognition method FOTS,and improved by 22.9%and 9.7fps respectively compared with the two-stage recognition method CTPN-CRNN.2.For particular document analysis and recognition scenarios such as tickets,it faces two challenges:scarcity of real samples and difficulty in extracting key informa-tion.Therefore,this paper designs a weakly supervised structured recognition method for such images:(1)Based on the idea of style transfer,a ticket-type data augmenta-tion scheme is proposed as a preprocessing step to generate large-scale realistic training data;(2)In order to parse out the key information(such as the passenger,start station,arrival station,etc.in the train ticket)from the ticket image and output with a struc-tured form,an extremely low computational perception module is proposed to replace the explicit text detection operation in the previous document recognition method.This module adaptively finds the regions of interest in the ticket images.After global aver-age pooling,each field will correspond to a specific feature sequence,and followed by a recognition module to output the structured results containing each field directly,avoid-ing the cumbersome and complicated analysis process.Experiments on the benchmark datasets show that the proposed structured recognition method achieves state-of-the-art in both performance and efficiency.For instance,on the train ticket dataset,the accu-racy and speed are improved by 6.8%and 12.3fps compared with PixelLink-CRNN.After using style transfer to augment the training data,the accuracy of the proposed recognition method further improves by nearly 10%on the train ticket dataset.

Keywords/Search Tags:

Document Analysis, Text Detection, Text Recognition, Structured Recog-nition, Data Augmentation

PDF Full Text Request

Related items

1	Research On Deep Learning Based Text Detection And Recognition
2	Research And Implementation Of Key Technologies For Medical Document Text Recognition Based On Deep Learning
3	Research On Real-time Detection And Recognition Of Dense Text In Natural Scenes
4	Design And Implementation Of Scene Text Recognition System
5	Research On Short Text Sentiment Analysis And Its Applications
6	Research On Sentiment Analysis Based On Text Data Augmentation And Hybrid Model
7	Design And Implementation Of Handwritten Chinese Character Recognition Platform Based On Text Recognition
8	The Method And Its Application Of Speech Text Analysis Based On Multi Document Summarization
9	Research On Deep-Learning-Based Scene Text Detection And End-to-End Recognition
10	Text Detection And Recognition In Invoice-style Data