Font Size: a A A

Automatical Table Structure Identification And Analysis In General Document Images

Posted on:2004-11-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:G S ShiFull Text:PDF
GTID:1118360185497015Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
Table is widely used in various kinds of document. It is a well-formatted data arrangement and presentation method, and provides convenient way for data query, indexing and calculation. Because of these reasons, automatically table processing becomes an important branch of DIA in recent years. There are many open problems in this domain derived from the complexity and variability of table, and from image quality as well. It is a big challenge for all the researchers to extract table structure from a general document image automatically and efficiently.This paper proposed a creative model to describe general table structure. This model reflects kernel characteristics of table structure. Simple region set was used to describe table layout structure while cell chain was used to describe table logical structure. Grid matrix provides formula data for layout representation, all of the information in grid matrix can be easily queried for logical analysis.A novel performance evaluation mechanism is also proposed to solve table groundtruthing, a well-known open problem in DIA. The difficulty of this task is equal to the"Graph matching/ Isomorphism"problem. We used intersection point matrix to compare process result with standard result. This method converts the open problem to a limited, computable problem– 2-D matrix matching problem, and provides an automatic and efficient way to evaluate performance of a table auto-processing system.The advance of machine intelligence can be reflected from a complex, stable, efficient processing system. We build up a complete table auto-processing system based on an existing DIA system (RTK). The model-driven processing flow and different algorithms are organized in a well-designed architecture. This system was tested through a big sample set containing about 2,000 pages of document image collected from various domains. A satisfactory performance was shown by the system, which proves the efficiency of our model and processing system. The test result was provided in the end of this paper.
Keywords/Search Tags:Automatically Table Processing, Layout Identification and Analysis, Simple Region Set, Grid Matrix, Intersection Point Matrix
PDF Full Text Request
Related items