Font Size: a A A

Research And Implementation Of Natural Scene Text Recognition Based On Deep Learning

Posted on:2024-08-25Degree:MasterType:Thesis
Country:ChinaCandidate:R M JiangFull Text:PDF
GTID:2568307079966049Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Text is one of the most important carriers of information in modern society,and extracting text information from a large number of images is currently one of the research hotspots.With the increasing prevalence of multilingual scenes in modern society,single-language text recognition has certain limitations in practical applications.Therefore,research on scene text recognition in multilingual scenarios is of great significance.In the field of multilingual scene text recognition,traditional methods that concatenate language recognition and text recognition may introduce accumulated errors.Additionally,the character set is usually larger in multilingual scenarios,and similar characters in different languages pose great challenges to language recognition.To address these challenges,this thesis proposes a multilingual scene text recognition algorithm that achieves higher overall recognition accuracy by leveraging both optical text recognition and semantic information correction.In terms of optical text recognition,this thesis embeds language information into the text recognition model to improve recognition accuracy in multilingual scenarios.Compared to concatenating language recognition and text recognition,embedding language information into the recognition model is less likely to introduce accumulated errors.By comprehensively analyzing global and local features in the language recognition model,this thesis can better distinguish language types with similar characters.Finally,the proposed algorithm is tested on the ICDAR2019-MLT dataset to validate the effectiveness of the improvement.In terms of text correction,this thesis takes into account the rich semantics in text and proposes using a CNN-based Seq2 Seq model for text correction.Since errors in scene text recognition are often unrelated to semantics,correcting recognition results based on text semantics is meaningful.By treating text correction as a machine translation task and using a CNN-based architecture,this thesis improves the training speed compared to RNN-based models.Language information is embedded into the model,the feature embedding mode is changed,and global residual connections are introduced to make the model adaptable to the requirements of correcting recognition results.Moreover,this thesis proposes a method for generating a multilingual scene text correction dataset,and the effectiveness of the improved model is verified on this dataset.Finally,to apply the proposed algorithm in practice,this thesis designs a web application for natural scene text recognition with a complete user experience.
Keywords/Search Tags:Scene Text Recognition, Muliti-Language, Language Classification, Text Correction
PDF Full Text Request
Related items