Font Size: a A A

Printed Text Recognition Based On Deep Neural Network

Posted on:2021-01-17Degree:MasterType:Thesis
Country:ChinaCandidate:H Y LiuFull Text:PDF
GTID:2428330602981486Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the information society,people have to deal with a large number of various types of text data.In order to help people quickly complete the information entry,text recognition technology came into being and has a very broad application prospect.At present,most mature printed text recognition systems or software on the market are only for a single application scenario,such as invoice recognition,ID card recognition,document recognition,etc.,so they can only recognize a certain type of images and text,and there is still a lack of universal character recognition system for recognizing various types and fonts.To this end,this paper implements a printed text recognition algorithm suitable for a variety of types and fonts.The recognition types include books,publications,posters,leaflets,bills,and other plain text images containing printed text.There are 13 types of characters that can be recognized,including first and second level Chinese characters and uncommon Chinese characters,uppercase and lowercase English letters,numbers,and common punctuation marks.There are a total of 6870 characters,and various types of characters can be mixed recognized.Therefore,the scope of application is more extensive.This paper elaborates on the algorithm processing flow of printed text recognition technology,including image tilt correction,text detection,text recognition and result verification.It implements a text recognition algorithm for printed text images acquired by electronic devices such as scanners and cameras,and completed network training.Collect paper text information through a scanner or camera and generate a text.image,use digital image processing technology and deep learning algorithms to complete the text recognition in the image,thereby quickly extracting text information,saving time and labor costs.It has important practical value and theoretical significance in information processing and other aspects.The main work of this article includes the following aspects:(1)In order to solve the current problem of difficulty in obtaining real image datasets,this paper completes large scale synthesis of printed text images.The dataset contains rich semantic information,showing various changes in text features,background,degree of blur,and other aspects,which can improve the robustness of the model very well.(2)This paper implements the printed text recognition algorithm under the real scene,including the text detection algorithm based on the CTPN model and the text recognition algorithm based on the CRNN model.Combining the two achieves end-to-end recognition of large texts.Based on the existing research,this paper adjusts and tests the network structure and parameters,and finally obtains a network model suitable for printed text recognition through a large number of comparative experiments,and shows a good recognition effect.(3)In order to verify the recognition results,this paper designs and implements a result verification algorithm based on template matching.Combines with the matching degree,character position and other information to detect whether there is wrong recognition and missing recognition,and correct typos and missing characters,further improves the accuracy of character recognition,and reached 99.5%.
Keywords/Search Tags:Neural Network, Printed Text Recognition, Text detection, Dataset Generation, Template Matching
PDF Full Text Request
Related items