Font Size: a A A

Research And System Implementation Of Distorted And Variable Length Form Recognition

Posted on:2022-11-30Degree:MasterType:Thesis
Country:ChinaCandidate:S J WangFull Text:PDF
GTID:2518306773990609Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
With the coming of the information age,various industries are undergoing intelligent reform,and electronic management of the information and office automation have become inevitable trends.However,the process of electronic management of paper forms is too labor-intensive to rely only on manual entry,so it is increasingly urgent to use computers to develop form image recognition system which extracts structured information from semi-structured data.Due to the complex structure of form images and the uneven quality of image acquisition,form image recognition is still a challenging research direction.Existing form recognition methods are generally not very adaptable,especially in the case where the form image is distorted,and the form length varies.The method performance is greatly reduced,and such problems cannot be processed.This article focuses on the two problems of distorted form image recognition and variable-length form image recognition in the form recognition problem,and proposes two algorithms for distorted form information extraction based on local space matching and variable-length form information extraction based on layout structure migration.And based on this,a form recognition system is designed and developed.The main work of this article is as follows:1.An algorithm for extracting distorted form information based on local space matching is proposed.General template matching algorithm is difficult to locate the corresponding region effectively when the form in the image is distorted and deformed.Therefore,in this article,after extracting the features of the reference region on the template image and the query image,we use multi-stage matching to complete the matching of the reference regions.Then,the local mapping relationship is constructed by multi-factor weighting to locate the recognizing region,which alleviates the situation of incorrect location of the recognizing region due to the distortion of the form to a certain extent.At the same time,this article establishes a distorted form image dataset FID,and takes the overlap of the recognizing regions as the evaluation index.On this dataset,the accuracy of the proposed method reaches 88.5%.2.A variable length form information extraction algorithm based on layout structure migration is proposed.The variable-length form is a form in which the number of recognizing regions of the same category in a certain area of the form is not fixed,which makes the structural difference between different instances of the same class form.Therefore,this article first models the template image in the manner,with the reference regions and the recognizing regions for the node.The position features and semantic features of regions are fused as the relational features of regions.We use relational features to transfer the layout structure between the reference regions and the recognizing regions on the template to the query image,and the model is then optimized using the message passing algorithm of the conditional random fields.At the same time,this article establishes a variable length form image dataset FIV,and the method has achieved 85.63% accurate rate on the dataset.3.A form recognition system is designed.On the basis of the above two work,the pre-processing and post processing modules are integrated,and the overall flow recognition of the form image information is implemented,and the system is experimentally evaluated against the existing form recognition API in terms of running time,regional accuracy and recognition accuracy.The experimental results show that the system designed in this paper performs better than the existing form recognition API in both FID and FIV datasets and has practical application value.
Keywords/Search Tags:Form Recognition, Information Extraction, Distorted Forms, Template Matching, Variable-length Forms
PDF Full Text Request
Related items