Font Size: a A A

Design And Implementation Of Image Data Set Generation System Of Form Document Based On Paper Simulation

Posted on:2022-03-06Degree:MasterType:Thesis
Country:ChinaCandidate:J S YangFull Text:PDF
GTID:2518306341452954Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The form is a common information carrier that needs to be processed frequently.Its special logical structure determines that the machine cannot use simple text recognition technology when processing the form.With the vigorous development of Internet technology and the hardware improvement of mobile devices such as mobile phones and tablet devices,in addition to electronic document forms,more and more forms are generated from form document images captured by cameras.How to make the machine understand the table structure and content in the table document image has become a widely discussed issue in academia and industry.By using the deep learning method to analyze the structure of the table,the labor cost of image processing of the table document can be greatly reduced,and the document process automation process can be promoted.In order to optimize the effect of deep learning algorithms,in addition to the province of the algorithm,the most important thing is to have a fully labeled image data set of table documents as training and evaluation data.The form document image data set generation system is to generate diversified form document image data sets for deep learning algorithm training and evaluation.This system has two main functions.One is to generate standard form document images by converting and parsing LaTex files,and to mark the tables in the document images with character matching algorithms to generate standard form document image data sets;It simulates the paper structure and crease distortion through the production of a simulated paper model,and then renders the document image onto the paper model through the realistic rendering technology,and outputs it as a simulation table document image through the camera viewport,which serves as a countermeasure for the mobile phone,The simulation simulation of the real paper form document image obtained from the mobile camera such as the tablet,and then annotate the form structure in the simulated form document image through the texture mapping algorithm,thereby generating the simulation form document image data set.In addition,the paper also designed a graphical user interface based on Python Qt,which can be deployed on a Python operating platform to facilitate user-computer interaction.The paper introduces in detail the design and implementation of a table document image data set generation system based on paper simulation.First,according to the usage scenarios of the table document image data set,a requirement analysis was carried out,and the overall architecture of the system was designed.Secondly,the related technology of system development was investigated,and the technical realization of each functional module was determined.Then further analyze the processing flow of each functional module to complete the code writing work.Finally,the system was tested according to the functional requirements,the experimental results were analyzed and the system was optimized.The test results show that the system can meet the functional requirements of data set generation and can run stably in the actual environment.
Keywords/Search Tags:Form Structure Recognition, Data Set Construction, Form Document Image, Image Processing
PDF Full Text Request
Related items