Font Size: a A A

Extraction And Analysis Of Table And Graph In The Document Image With Complex Layout

Posted on:2016-01-12Degree:MasterType:Thesis
Country:ChinaCandidate:J X BianFull Text:PDF
GTID:2348330488474149Subject:Traffic Information Engineering & Control
Abstract/Summary:PDF Full Text Request
With the advent of the information age, document electronically has become an important trend in information technology. Table is a highly concentrated form of text information, in which text and graph is covered, and it is standard, concise and easy to handle. Graph is a vivid description of the text, obscure text can be expressed through the form of a graph. Extraction and analysis of table and graph robustly and precisely, in the document image with complex layout, is the key to document electronically. This paper carried out a thorough study, and the main contents are as follows:1. Document image is preprocessed based on the characteristics of table and graph. With the combination of Huang fuzzy algorithm and OTSU algorithm, document image is binarized, and experimental results prove that this algorithm has good anti noise performance and reduces the fault of the table lines. What's more, for declining binary document image, it is corrected by the algorithm based on the Hough transform and the morphological algorithm, and the validity of the algorithm is proved by experiments..2. Table analysis is divided into three parts : table line extraction, cell extraction and table reconstruction. In the first part, the extraction of the table frame lines is based on the morphological algorithm, which can extract the table frame accurately and locate the table area. In the second part, for extracting the cells effectively, the paper do thinning for extracted table frame lines. Through the comparison of the results of Hilditch, Rosenfeld, searching index table and parallel thinning algorithm, the parallel thinning algorithm is adopted, and the shortcomings of its uncoplement of thinning are improved. Then, We use a table feature point based arithmetic to extract cells and Hough transform to detect slash within the cells. In the third part, connected domain scanning is carried out to locate text messages within the cells, and reconstruction of the table is completed by the integration of table lines and slash.3. Graph is extracted by using the algorithm based on contour tracing, which can extract graph from the document image without tables. In this paper, a set of rules for graph similar to tables are proposed, and the nested complex documents are classified and processed to complete the extraction and analysis of the table and graph. In a database of 2038 images, the accuracy rate of the table and graph extraction is more than 82%.
Keywords/Search Tags:table analysis, skew correction, morphological algorithm, graph extraction, contour tracing
PDF Full Text Request
Related items