Font Size: a A A

String Recognition Research Based On Deep Learning

Posted on:2022-05-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:H J ZhanFull Text:PDF
GTID:1488306479477424Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
String recognition refers to transcribing a string image into text that is convenient for computer storage,retrieval,processing and understanding.It has a wide range of requirements in the fields of information retrieval,form recognition and understanding.At the same time,string recognition has huge potential values in machine intelligence,automatic drive and other related fields,and attracts the attention of academia and industry.Great progress has been made in many fields,and some of the achievements have been put into practical use.However,due to the complex background,variable font styles,and character adhesion of string images,string recognition is still a challenging research direction.Traditional string recognition methods have difficulty in improving the recognition performance and cannot meet actual application requirements.In recent years,deep learning methods have developed rapidly.Many end-to-end network architectures have achieved significant improvements over traditional methods in the fields of speech recognition,image recognition,and natural language processing,demonstrating the powerful capabilities of deep learning technology.Based on the deep learning method,this thesis studies the string recognition problem,and proposes a string recognition model based on residual convolutional recurrent neural network,a string recognition model based on Semanteme-Glyphs fusion embeddings,a string recognition model based on compound attention convolution and dual decoders,and an RNN-free string recognition model,and evaluate these models on different strings such as Chinese,Thai,Vietnamese,and many kinds of digit strings.The main novelties and contributions are as follows1.A string recognition model based on residual convolutional recurrent neural network is proposed.It consists of residual convolutional network,residual recurrent network and connectionist temporal classification(Res-CRNN).The addition of residual connections effectively improves the feature extraction and modeling capabilities of the convolutional network and the recurrent network,thereby improving the final string recognition performance.Experiments on multiple string datasets show that both the residual convolutional network and the residual recurrent network can improve the recognition performance.Both residual convolutional network and residual recurrent network can improve the string recognition performance.On the digit string recognition benchmarks CAR-A and CAR-B,the proposed Res-CRNN achieves string-level accuracy rates of 90.83%and 92.55%,respectively,which was the best performance in the same period.2.A string recognition model based on the semanteme-glyph fusion embeddings is proposed.In this thesis,we take advantage of the correlation between the glyph and semanteme of Chinese character,and combine the character semanteme and character glyph features to make better use of semantic information to improve the recognition performance.After extracting character glyph features and semanteme features,a parameterized gated fusion strategy is used to automatically select the ratio of the glyph feature and the semanteme feature in the fusion embedding for each character,so as to generate its unique fusion feature for different characters.The LSTM decoder usually uses the recognition result at the previous timestep to improve the prediction accuracy at the current timestep.In the proposed method,it uses the glyph-semanteme fusion embeddings to represent the prediction of the last timestep,and uses the predictions of the last timestep and multiple previous timestep to assist the prediction of the current timestep.The experimental results show that this method can effectively utilize semantic information,and achieve a character-level accuracy rate of 96.65%on the Chinese string dataset ICDAR2013-HCTR,which achieves the optimal performance over the same period.3.A string recognition model based on compound attention convolution and dual decoders is proposed.In some languages,vowel characters are much smaller than consonant characters,and tones appear in the text as independent characters or part of characters,which brings new challenges to the recognition of related strings.The proposed method first uses different sizes of convolutional kernels in the convolutional layer to extract multi-scale image features,and then uses the Convolutional Block Attention Module(CBAM)in the convolutional layer to select the features to obtain better characteristic representation of small characters such as,tone,etc.,finally combined with two general decoding methods,namely CTC decoder and LSTM decoder,to improve the accuracy of the decoding stage.This method has achieved 86.07%and 95.72%character-level accuracy rates on the Thai character string dataset and Vietnamese character string dataset,which is better than other methods such as CRNN.The experimental results show that both the compound attention convolution and the dual decoder can effectively improve the recognition performances of this model.4.An RNN-free string recognition model is proposed.Recurrent neural networks have significant advantages in modeling contextual information in sequences and are usually used as one of the basic structures of sequence recognition models.However,not all strings have rich context information,and recurrent neural networks also have disadvantages such as complex calculations and difficulty to parallelize and accelerate.This thesis proposes a string recognition model without using recurrent neural network(RNN-free model).After extracting image features,the convolutional layers are directly connected to the CTC by feature map dimensional transformation,thereby CNN-RNN-CTC model is transformed into CNN-CTC model.The experimental results show that this method achieves better performances than the RNN-containing model in context-independent or weakly relevant string recognition tasks such as digit strings.RNN has the ability to apply the paradigm learned on the training set to the testset at the test phase.When there are big differences between the training set and the test set,such as string length difference or string content distribution difference,the performance of the RNN-free model is better than the model with RNN,which proves the effectiveness of the proposed method.The string-level accuracy rates of this method on CAR-A and CAR-B are 93.23%and 94.87%,respectively,which is the best performance at present.At the same time,the model has faster speed and smaller model size than models that use RNN.
Keywords/Search Tags:String Recognition, Deep Learning, Residual Connection, Glyph-Semanteme fusion Embedding, Compound Attention Convolution, RNN-free
PDF Full Text Request
Related items