Font Size: a A A

Research On Encoder-Decoder Based Two-Dimensional Structural Text Recognition

Posted on:2021-04-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:J S ZhangFull Text:PDF
GTID:1368330602494253Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Text recognition is a problem which focuses on how to make machines recognize characters or texts in handwritten input,printed input or image input correctly,so that machines can understand what the input means.The progress of the research on text recognition has a great impact on benefiting people's daily life.There are many prac-tical applications coming from text recognition,like handwriting input application in portable devices,photo translation,machine marking and so on.Among the researches on text recognition,two-dimensional structural text recognition is much more diffi-cult than common one-dimensional text recognition.The complex structure in two-dimensional text is hard to be parsed.Taking machine marking as an example,the recognition of Chinese texts or English texts belongs to the problem of one-dimensional text recognition as the recognition order of these texts is one-dimensional,usually from left to right.However,the recognition of mathematical expression belongs to the prob-lem of two-dimensional text recognition as the internal structure of mathematical ex-pression is two-dimensional,e.g.,the above-below structure coming from fraction op-eration and the inside structure coming from root operation.Therefore,research on two-dimensional structural text is important for improving the general practical value of text recognition.Recently,with the progress of deep learning,more and more methods have been proposed for many applications,for example,the recognition performance of English or Chinese text has been greatly improved by using deep learning methods.However,due to the two-dimensional complex structure,current deep learning methods are difficult to be directly applied on two-dimensional text recognition.Therefore,the recognition per-formance of one-dimensional texts is much better than the recognition performance of two-dimensional texts.In this paper,we propose to employ the encoder-decoder as an end-to-end method for two-dimensional text recognition.The encoder-decoder models have been successfully applied on many image-to-sequence translation or sequence-to-sequence translation tasks,therefore,we believe it is possible to utilize encoder-decoder to significantly improve the two-dimensional text recognition.Besides,to further im-prove the performance,we also modify the classic encoder-decoder in many aspects to improve its generalization ability on dealing with two-dimensional structures.The main contributions of this paper are summarized as follows:1.We propose a novel string decoder based encoder-decoder model for of-fline two-dimensional text recognition.We first use the convolutional neural network to extract high-level visual features from input images,then we use the attention model to complete the symbol alignment and relation detection among symbols,finally we use the recurrent neural network based decoder to parse the complex two-dimensional structures.This method optimizes the symbol recognition and structural analysis under a global way and greatly improve the recognition performance of mathematical expres-sion.Besides,the proposed method also enables the radical based Chinese character recognition.2.We propose a novel string decoder based encoder-decoder model for on-line two-dimensional text recognition.Compared with offline static images,online dynamic trajectory contains sequential information.In this paper,we first employ re-current neural network to extract features from sequential input,then we make use of the complementary between online modality and offline modality,which can help deal with the delayed stroke and inserted stroke problems appeared in online recognition.Besides,dynamic trajectory can provide accurate stroke based alignment,which can be used to enhance the alignment generated by attention model and finally improve the recognition performance.3.We propose a novel tree decoder based encoder-decoder model for online two-dimensional text recognition.The inherent representation of two-dimensional structural text should be tree structures,therefore,a tree decoder will enhance the gen-eralization ability of complex structures.However,directly using symbols as tree nodes brings ambiguities as the symbols would be repeated,hence few researches have applied tree decoder for two-dimensional text recognition.While in this paper,we propose to use absolute spatial positions of symbols,which are the unique information of online modality input,to be tree nodes because the spatial positions can successfully distin-guish ambiguous symbols.The dynamic trajectory based tree decoder greatly improve the recognition performance and generalization ability on online two-dimensional struc-tural text recognition.4.To overcome the restriction of input modality when using tree decoder,we further propose a general tree decoder,which can be applied on any input modality.In this paper,we propose a novel memory module to store output symbols,then use the positions of symbols stored in memory as the tree nodes,so that there will be no ambiguities.To make the tree decoder trained properly,we also propose a novel memory attention model and spatial attention self-regularization module.The proposed general tree decoder greatly improve the performance and generalization ability on offline two-dimensional structural text recognition.
Keywords/Search Tags:two-dimensional structural text recognition, mathematical expression recognition, radical based Chinese character recognition, encoder-decoder, attention, tree decoder
PDF Full Text Request
Related items