Font Size: a A A

Design And Implementation Of Automatic Recognition Method For Irregular Forms

Posted on:2021-11-17Degree:MasterType:Thesis
Country:ChinaCandidate:F Q XuFull Text:PDF
GTID:2518306122468494Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
At present,the recognition of the text information on the document image has been mature,and there are many products in the market that have good recognition results for the characters on the document image,but the recognition of the table image in the document still needs to be further improved.When the table is distorted and the illumination is not equal,the ideal recognition result may not be obtained.This thesis analyzes the domestic and foreign table recognition techniques,mainly studies the binarization of low-quality document images and geometric deformation correction of table images,and finally realizes a set of robust table recognition algorithm.The main work of this thesis is as follows:1)The binarization algorithm of document image is studied.A binarization algorithm based on background estimation is proposed to solve the problem of illumination inequality in document images.The stroke width is estimated by the stroke width transformation,and the local window size applied in the binarization process is determined by the stroke width.The improved local binarization algorithm and global thresholding method are used to process the images after backgrou nd compensation.Experimental results show that the algorithm can suppress the background noise to a certain extent and make the classification of foreground and background correct.2)The distortion correction algorithm of table image is studied.In this thesis,a statistical algorithm for line extraction is proposed,and the extracted line segments are modified by fitting the lines of the table.The intersection point between the line segments is used to transform the perspective of the table image,and then the distortion correction is carried out by fitting the border line of the table.3)According to the cross relation between the row line and the column line of the table,the set of all feature points of the table can be obtained.The vertex coordinat es and vertex characteristics of each cell in the table are determined by the relationship between the feature points of the table.4)The table on the document image are converted into an editable spreadsheet.Sort all the rows and columns,treat the lines of the table that are not connected but belong to the same row/column or nearly belong to the same row/column as the same row/column,and determine all the number of rows or columns.Determine the rows and columns of the four vertices of each cell,merge the cells that span multiple rows or columns to obtain the frame structure of the table,and write the table information into excel in combination with the recognized characters of the cell to achieve the reconstruction of the table.On the basis of the above work,this thesis implements an image-based form recognition system and designs a concise and friendly interactive interface.The experiment results show that the recognition rate of the distorted form image can reach85%,which is higher than some of commercial form recognition software.
Keywords/Search Tags:Form identification, Binarization algorithm, Table line detection, Distortion correction, Form refactoring
PDF Full Text Request
Related items