Font Size: a A A

Research And Implementation Of Form Recognition System Based On Deep Learning

Posted on:2019-05-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y D XiongFull Text:PDF
GTID:2428330572463629Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Forms exist in both paper and digital(electronic)form in our daily life,Paper forms are easier to read and share,digital forms are easier to index and save.In order to carry out more effective knowledge management,first,it is usually necessary to convert the paper form to a digital format,and then extract the text information of the form.Traditional form data is statistically analyzed by manual entry,with the research and development of artificial intelligence and pattern recognition,the way people obtain the text information of forms is no longer limited to the way of manual entry or scanner,instead,computer technology is used to automatically recognize form information,which can greatly improve office efficiency,and provide convenience for people in their daily livesIn this thesis,by studying the development status of form recognition at home and abroad,we have studied the localization of form documents and the printed character recognition of specific type forms under natural conditions based on convolutional neural network(CNN)under the framework of deep learning.The main research works are as follows:(1)Research on form localization algorithm based on residual networkFor the form localization problem,the residual network is selected to deal with the problem by comparing the proposed common CNN model and residual network model in this thesis,meanwhile,the residual network is combined with the dilated convolution and dropout technology to locate the form more accurately.In order to make the network more generalized,the documents in the experiment are not only forms,but also papers,magazines,etc.,at this time,the localization of the documents does not depend on the content of the documents themselves.In this thesis,the problem of form localization is modeled as eight feature points detection,the eight feature points are top-left corner(TL),the top-right corner(TR),bottom-right corner(BR),bottom-left corner(BL)and four midpoints.First,we use the method that dilated residual CNN combines the midpoint constraint to roughly locate the four corner points of the form,then,according to these four points,the form image is divided into four areas,finally the corners are recursively refined by the network to realize the localization of the form.(2)Research on Chinese character recognition method of a specific type form based on Gabor and CNNTesseract-OCR and a method based on Gabor and a improved LeNet network are used to address character recognition of specific type forms,the structure of English characters and numbers are simple.When Tesseract-OCR is used to recognize them,it can get good recognition effect.However,there are many kinds of Chinese characters and the structure is also complicated.The accuracy is not high enough when using Tesseract-OCR.In this thesis,a method based on Gabor feature extraction and improved LeNet is proposed for Chinese character recognition.Experimental results show the method of this thesis has a high accuracy rate for Chinese character recognition.At the same time,we have studied a set of implementation process from form localization to form character recognition in the thesis.With the development platform of QT and the development language of C++,we have realized Chinese character recognition system of specific types of forms by invoking the convolutional neural network model.
Keywords/Search Tags:form character recognition, form localization, CNN, Tesseract-OCR
PDF Full Text Request
Related items