Research And Implementation Of Form Recognition System Based On Deep Learning

Posted on:2019-05-05

Degree:Master

Type:Thesis

Country:China

Candidate:Y D Xiong

Full Text:PDF

GTID:2428330572463629

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Forms exist in both paper and digital(electronic)form in our daily life,Paper forms are easier to read and share,digital forms are easier to index and save.In order to carry out more effective knowledge management,first,it is usually necessary to convert the paper form to a digital format,and then extract the text information of the form.Traditional form data is statistically analyzed by manual entry,with the research and development of artificial intelligence and pattern recognition,the way people obtain the text information of forms is no longer limited to the way of manual entry or scanner,instead,computer technology is used to automatically recognize form information,which can greatly improve office efficiency,and provide convenience for people in their daily livesIn this thesis,by studying the development status of form recognition at home and abroad,we have studied the localization of form documents and the printed character recognition of specific type forms under natural conditions based on convolutional neural network(CNN)under the framework of deep learning.The main research works are as follows:(1)Research on form localization algorithm based on residual networkFor the form localization problem,the residual network is selected to deal with the problem by comparing the proposed common CNN model and residual network model in this thesis,meanwhile,the residual network is combined with the dilated convolution and dropout technology to locate the form more accurately.In order to make the network more generalized,the documents in the experiment are not only forms,but also papers,magazines,etc.,at this time,the localization of the documents does not depend on the content of the documents themselves.In this thesis,the problem of form localization is modeled as eight feature points detection,the eight feature points are top-left corner(TL),the top-right corner(TR),bottom-right corner(BR),bottom-left corner(BL)and four midpoints.First,we use the method that dilated residual CNN combines the midpoint constraint to roughly locate the four corner points of the form,then,according to these four points,the form image is divided into four areas,finally the corners are recursively refined by the network to realize the localization of the form.(2)Research on Chinese character recognition method of a specific type form based on Gabor and CNNTesseract-OCR and a method based on Gabor and a improved LeNet network are used to address character recognition of specific type forms,the structure of English characters and numbers are simple.When Tesseract-OCR is used to recognize them,it can get good recognition effect.However,there are many kinds of Chinese characters and the structure is also complicated.The accuracy is not high enough when using Tesseract-OCR.In this thesis,a method based on Gabor feature extraction and improved LeNet is proposed for Chinese character recognition.Experimental results show the method of this thesis has a high accuracy rate for Chinese character recognition.At the same time,we have studied a set of implementation process from form localization to form character recognition in the thesis.With the development platform of QT and the development language of C++,we have realized Chinese character recognition system of specific types of forms by invoking the convolutional neural network model.

Keywords/Search Tags:

form character recognition, form localization, CNN, Tesseract-OCR

PDF Full Text Request

Related items

1	Design And Implementation Of Image Form Data Recognition System Based On OCR Technology
2	Research On Form Recognition
3	Image-based Form Recognition Algorithm And Automatic Entry System
4	Constrained Form Recognition System
5	Research On Pre-processing And Character Extraction Of Form Document Recognition
6	Research On Form Structure Recognition Based On Image Technology
7	Research On Form And Chinese Characters Recognition In Printed Chinese Document Recognition System
8	Research On Form Recognition In Printed Document Recognition System
9	Research On Auto-recognition For Chinese Hand-written Commercial Characters In Form Processing
10	Research On Fast Recognition Method Of Handwritten Form Digital String Based On Self Learning