Font Size: a A A

Deep Learning Based Text Sequence Recongition System

Posted on:2019-11-01Degree:MasterType:Thesis
Country:ChinaCandidate:J H YouFull Text:PDF
GTID:2428330590473922Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Presently,China's artificial intelligence technology has entered a phase of rapid development.Scene text recognition,as a key combination of computer vision and natural language processing,have received extensive attention from the government,industry,and academia.The State Council has proposed an artificial intelligence development plan and pointed out that China will use artificial intelligence technology widely in such areas as education,medical care,old-age care,environmental protection,and urban construction.In-image text recognition technology will provide effective services for these services.However,the pictures in these actual application scenarios are often very complicated.They usually contain interference from complex backgrounds,and the complexity of the text itself in terms of content,typesetting,glyphs,and so on.Therefore,it is urgent to implement a text recognition engine that can accurately and quickly perform character recognition and has good robust performance in practical applications.At present,on the one hand,the text recognition annotation data set on open scenes such as the emerging Internet is extremely scarce.On the other hand,the speed and accuracy of the current scene text recognition model cannot meet the requirements of real applications.Therefore,this topic has carried out research on the text recognition task of scenes.In response to the lack of data recognition in the emerging Internet scene,we designed a picture synthesis engine.By analyzing the generation characteristics of the network picture and the statistical characteristics of the original network text picture data,the engine fits the distribution of the original network picture through image processing technology and artificial rules.The final image composition engine can generate artificially synthesized text images independently and identically to the original network image.The paper verified that the synthetic image can improve the average editing distance recognition accuracy from 66.79% to 84.47% on the ICPR MTWI partition test set only when it is used to expand the model training data,which greatly improves the model in complex scenes.Generalization and recognition capabilities.The model based on the image composition engine is also ranked second among the universities participating in the 2018 ICPR MTWI competition task 1.Insufficient feature extraction for the current text recognition model and the problem that LSTM cannot parallel sequence prediction.In this paper,a parallel sequence identification framework Res PNN is proposed,which uses the ultra-deep residual network to extract feature features of text images,and then uses location vector sequence re-modeling and multi-layer perceptron classification.The framework achieved optimal or sub-optimal results on multiple ICDAR text recognition data sets.The recognition model based on this framework also achieved the highest transcriptional accuracy rate of 94.62% in the basic competition task of the ICDAR IEHHR word level in 2017,ranking first.In order to solve the problem of large difference in color,background and the like in the open scene,the residual structure Max-Inception based on multi-size convolution and Maxout activation is designed.The structure can adopt different convolution receptive fields and convolution layers for different text pictures,and increase the model's ability to extract features of characters in complex pictures.Experiments show that the structure can effectively improve the feature extraction ability of Res PNN and CRNN text recognition models,thus increasing the overall recognition ability of the model.
Keywords/Search Tags:scene text recognition, sequence labeling, deep learning, picture synthesis, artificial intelligence
PDF Full Text Request
Related items