Font Size: a A A

Form File Identification And Understanding

Posted on:2007-06-04Degree:MasterType:Thesis
Country:ChinaCandidate:L L GaoFull Text:PDF
GTID:2208360185982298Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Table-form documents reorganization and comprehension is mainly studied in the thesis. We consider the table-form document is composed of many invisible grids, then the whole document can be formatted and the relationship among the grids should be studied. The table-form document can be auto-processed by analyzing the restrictions among the grids.This paper focuses mainly on two aspects: the physical aspect and logical aspect.One aspect is the design of an algorithm for recognition of physical structure of table-form documents. Because some disadvantages such as broken frame lines, intersection of frame lines and human writings, skewness brought in the scanning process and so on makes it more difficult to recognize the frame of a form, a robust algorithm is needed to address all of the above problems. There are many algorithms designed for skewness, most of which are fit for files without table, and since table is the main part of a table-form document, this paper presents an algorithm more effectively for table-form documents. An algorithm of recognition of physical structure is also implemented in this paper, results presented. The last step for physical structure recognition is vectorization, this vectorization algorithm can represent a form in an effective way by filtering redundant information from the picture of the documents.Another aspect is the analysis of logical structure of table-form documents. Cells are broken into several categories by their functions in a form, for instance, some cells are designed to accept the input of users or customers, and some cells indicate what should be written in other cells. So we can get the logical relationships between cells by their functions. The logical relationships can be described by extensible rules.Physical recognition and logic analysis have been studied through the discussion of the auto-processing of the table-form documents. The algorithms have been paid more attention.
Keywords/Search Tags:table-form documents, skewness rectification, physical structure recognition, logical structure analysis
PDF Full Text Request
Related items