Font Size: a A A

Constrained Form Recognition System

Posted on:2007-03-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y X LiFull Text:PDF
GTID:2178360185454118Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Form Recognition is a computer-assisted process, in which characters printed onpaper or other medium are recognized. From aspect of theory research, FormRecognition can be categorized to application of Pattern Recognition and ArtificialIntelligence. From aspect of application, Form Recognition is an automated high-speed input method for information processing and is an important component of thenew generation of intelligent computer interface.In recent years, the inputting, storing and managing forms automatically havealready become an important component of the field of intelligent documentprocessing. Form recognition technique becomes a hotspot in OCR and is attractingmore and more researchers' attentions. However, in reality, on one hand, formstructures are complicated and various, which make it very difficult to find a generalmethod to recognize all kinds of forms well. On the other hand, in many fields,usually some regular form units are needed to be processed. So, we develop andimplement a constrained form recognition system.In this paper, constrained form recognition is deeply studied in the aspect of theimage processing and pattern recognition. The goal is to improve the accuracy of formrecognition and to satisfy the increasing need of intelligent form processing. The maincontents detail the extraction of the form cells. Firstly, a flexible and expansible formstructure description is introduced. Secondly, after form skew detect and correct, welocally search the form lines to localize the form cell. Finally but importantly, weextract the form cell form the form image. In this section, there are two problems. Oneis the overlapping of the characters and lines, the other is that the characters exceedout of the cell frame line. We develop a new adaptive separating algorithm based ondistance-weighted to solve the former. And solve the later through counting theconnect components. This system makes the extraction results accuracy and integral.Meanwhile, we do some optimizations in each section to enhance the efficiency of thesystem. Experiments demonstrate that our system does very well in processing formsof the same type in batches.
Keywords/Search Tags:Constrained form recognition, Optical characters recognition, Pattern recognition, Extensible markup language, Separation of character and line, Connect component
PDF Full Text Request
Related items