Font Size: a A A

Research On Deep-Learning-Based Text Recognition And Document Segmentation And Its Application

Posted on:2020-06-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z C XieFull Text:PDF
GTID:1368330590961687Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Text recognition and document segmentation have a wide range of needs in humancomputer interaction,education and medical care,translation search and cultural protection,and are popular research topics in the field of pattern recognition.Text recognition mainly includes handwritten text recognition and scene text recognition.Handwritten text recognition is a very challenging research topic.Its main difficulties lie in its huge character set,charactertouching problem and variable-length input text.The main challenges of natural scene text recognition are the complex background with multi-noise interference,variable text shape appearance,rich color fonts and various ordering.In terms of document segmentation,the difficulties mainly due to the diversity of document layout,the character-touching problem in document and the damage of ancient document,the effects of aging and stains,etc.Focusing on text recognition and document segmentation,this paper mainly studies the application of deep learning in document processing,and rearch on text recognition and text segmentation based on deep learning.Specifically,this article mainly includes the following novelties and contributions:(1)For handwritten text recognition,we proposed a novel solution,including path signature,a multi-spatial-context fully convolutional recurrent network(MC-FCRN),and an implicit language model(implicit LM).We develop a novel segmentation-free MC-FCRN to effectively capture the variable spatial contextual dynamics as well as the character information for high-performance recognition.With a series of receptive fields of different scales,MC-FCRN is able to model the complicate spatial context with strong robustness and high accuracy.The residual recurrent network,a basic component of MC-FCRN,not only accelerates the convergence process but also promotes the optimization result,while adding neither extra parameter nor computational burden to the system,as compared to ordinary stacked recurrent network.Finally,we propose an implicit LM that learns to model the output distribution given the entire predicting feature sequence.Unlike the statistical language model that predicts the next word given only a few previous words,our implicit LM exploits the semantic context not only from the forward and reverse directions of the text but also with arbitrary text length.(2)For the new unconstrained online handwritten text recognition problem,a special perspective of the pen-tip trajectory,i.e.,only focusing on the variation between adjacent points,is suggested herein to reduce the difference between texts of multiple styles.Owing to the lack of relevant data,a new data augmentation method is developed to synthesize unconstrained handwritten texts of multiple styles,including horizontal,vertical,overlap,right-down,screw-rotation,and multi-line situations.To better model the unconstrained pen-tip trajectory,we propose multi-layer distilling GRU to process the input data in a sequential manner,which can accelerate convergence process without sacrificing recognition accuracy.(3)For scene text recognition and offline handwritten text recognition problem,we propose novel aggregation cross-entropy(ACE)loss function with competitive performance to CTC and attention mechanism.Owing to its simplicity,the ACE loss function is much quicker to implement(only four fundamental formulas),faster to infer and back-propagate(approximately O(1)in parallel),less memory demanding(no parameter and basic runtime memory),and convenient to use(simply replace CTC with ACE),as compared to CTC and attention mechanism.The ACE loss function can be adapted to the 2D prediction problem by flattening the 2D prediction into 1D prediction.The ACE loss function does not require instance order information for supervision,which enable it to advance beyond sequence recognition,e.g.,counting problem.(4)For document segmentation,we proposed a novel system that mainly consists of four stages,including preprocessing,boundary box segmentation(BBS),incremental weakly supervised learning and recognition-guided attention boundary box segmentation(Rg-ABBS).The character segmentation problem is formulated from the perspective of Bayesian decision theory.Through maximizing the posterior probability of class sequence given text line image,we derive three new algorithms to search for the segmentation path.Besides,we proposed a judgment gate(JG)mechanism that enables incremental weakly supervised learning on character recognition network(i.e.,character recognizer)that can provide reliable character recognition score to improve character segmentation results.The proposed Rg-ABBS significantly reduces time consumption by performing recognition-guided segmentation only on ‘attention' area and still achieves promosing performance.
Keywords/Search Tags:Deep learning, handwritten text recognition, scene text recognition, document segmentation
PDF Full Text Request
Related items