Font Size: a A A

Research On Form Document Image Analysis

Posted on:2014-02-15Degree:MasterType:Thesis
Country:ChinaCandidate:B LiuFull Text:PDF
GTID:2248330392960986Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Form document has been widely used in daily life since it is a conciseand standard kind of document which is easy to fill and process. Alongwith the arrival of the information age, electronic document has become aninevitable trend in the future. Accordingly, the automatic form documentprocessing system has attracted extensive attention of scholars andresearchers at home and abroad. In general, a form document processingsystem can be divided into two main functional parts: form documentclassification and information extraction. Once the input form document’category is determined, some critical information can be extracted based onthe prior knowledge of template form it matches with, so that the formdocument classification is essential for the correct information extraction inthe next step of form document processing system.This paper carries out a preliminary study on the form documentanalysis. First, in the document image preprocessing part, a new Haar-likefeatures based skew correction approach for scanned document is proposed;a modified coarse-to-fine search strategy is implemented to reducecomputation in the skew angle search. Experimental results show that ourskew estimation algorithm performs well on general printed documentswith different contents, languages and layouts. The accuracy of skew angleestimation is comparable with or better than state-of-the-art methods. Inaddition, this paper constructed a form document classification prototypesystem based on flexible template. On the one hand, the system caneffectively deal with the practical problems for the fixed template based form document classification, such as translations, rotations, or scalechanges. For these spatial changes, we have established a model todescribe transformations between the varied forms with the original one.Then by using Hough voting based strategy, the model parameters can beestimated. With the estimated value of the parameter, the input formdocument is normalized. On the other hand, flexible form allows that thesize or the number of some cells in the fixed form is variable. Dynamicprogramming (DP) approach is used in response to this change. First thebest match of the inner cells of two form documents is obtained by DPmethod. Second, similarity of matched cells is calculated and accumulatedas the final similarity score of two forms. Finally, select the template formfrom the form database which has the highest similarity score with theinput form document, and its category is output as the type of input formdocument. Experimental results show that the classification system webuilt has good classification performance.
Keywords/Search Tags:Skew correction, Haar-like features, form documentclassification, flexible template, Hough voting, dynamicprogramming
PDF Full Text Request
Related items