Font Size: a A A

Table Recognition Based On Digital Image Processing

Posted on:2020-07-02Degree:MasterType:Thesis
Country:ChinaCandidate:P W YaoFull Text:PDF
GTID:2428330575957611Subject:Engineering
Abstract/Summary:PDF Full Text Request
Table store various information of life,production and society by different forms.The automatic transformation of table from paper to editable spreadsheets has become an urgent requirement for the modern society along with the rapid development of computer and information technology.Spreadsheets has been steadily gaining ground in applications due to its convenient management,search and exchange for information.However,a large number of information is still stored in the paper.For the clerical department,it is difficult to convert the table of paper into spreadsheets.There is inefficient and prone to make mistake for drawing spreadsheets and manually entering text because the staff are not skilled in using office software.A study,how the function is achieved by quickly and accurately converting table of paper to spreadsheets,has been attracted considerable attention.In the current work,further research regarding to automatic conversion of paper forms into editable spreadsheets has been developed basing on the review of vast literatures.The process can be finished through automatically translating paper forms into spreadsheets,which included the information recognition of table line and text,table drawing and text padding;Information recognition of table line and spreadsheets drawing.The method with the combination of contour detection and BRISK feature detection is applied to complete the tilt correction for the table image.The separation and refinement of horizontal and vertical lines have been realized basing upon the morphological processing.Using the method of the Shi-Tomasi corner detection,extraction for the endpoint coordinates of the table line can be conducted.The extracted coordinate points are arranged in order from top to bottom,and then left to right.Meanwhile,the endpoints for the horizontal and vertical lines are required to align via data correction.The drawing of table could be completed utilizing the corrected endpoint coordinates.Information recognition of text and spreadsheets drawing.Information recognition of text included text positioning and recognition.The main idea of text positioning is the segmentation of cells that extracts each cell in the original image and then records the coordinate of the upper left corner in rectangular box for each cell.Applying the corrected endpoint coordinates of table line to draw the empty form.The empty table is conducted to extract the rectangular outline of each cell that cut up the rectangular area for each cell in the original image and then records the coordinate of the upper left corner for each rectangular box.The text recognition uses Tesseract-OCR to identify the text of each segmented cell,and can match the recognition result with the coordinates of upper left corner in the extracted text area,and then is used for filling the table text.Filling of table text,The absolute differences,subtracting the coordinates of upper left corner in the extracted text area from endpoint coordinates of horizontal and vertical lines,are conducted to calculate,respectively.The procedure with filling the text into corresponding cells can be finished when the absolute differences are less than the threshold.The table identification system in this paper consists of hardware and software systems.The hardware system converts paper forms into images of forms.The software system extracts the information for the table lines and text in the images of table,and uses the extracted information to perform table reconstruction in Word.
Keywords/Search Tags:Recognition of table, Correction of tilt for table image, Drawing of table, Tesseract-OCR, Reconstruction of table
PDF Full Text Request
Related items