Font Size: a A A

Research On Key Problems In The Form Identification

Posted on:2017-01-04Degree:MasterType:Thesis
Country:ChinaCandidate:L HeFull Text:PDF
GTID:2308330482975627Subject:Software engineering
Abstract/Summary:PDF Full Text Request
At present, most of the form data is statistic and analyzed by means of manual collection. With the rapid development of computer technology, it has become an inevitable trend to use computer technology to automatically identify and analyze the form images. In spite of the fact that some form recognition software system based on OCR technology has been applied in the fields of the mail sorting, bank notes analysis, ballot statistics and so on, there are still some problems in the automatic recognition of the general data form, such as the questionnaires without fixed constraints or Fill limits.In this paper, the research project, "Research and system construction on automatic information collection and statistical analysis technology based on image recognition”, is taken as the background, and the key technical problems that have not been solved well during the form processing process, including form image registration, table and handwritten symbol recognition are as the research focus. Aiming at the problem of form image registration, a document image registration method based on local feature image and Harris feature points detection has been proposed. In this method, Harris feature points of extracted local feature image from images are used as the basis of the estimation of image transforms spatial parameter, and the feature points are matched and purified by the feature points matching process twice, and finally image registration is accurately and efficiently completed. In order to recognize the table information of form image, a table recognition method of table feature extraction by substep has been proposed, firstly the table contour extraction method based on connected domain is used to extract table contour and confirm the area of the table, and then table line extraction method based on mathematical morphology is used to extract the table lines. Finally, according to the extracted intersection feature of table lines, the table cell information is obtained and the form image recognition is completed. For the recognition of handwritten symbol in the images, handwritten symbol recognition method based on deep learning model of convolutional neural network has been used in this paper, by building convolutional neural network in accord with the characteristics of handwritten symbols, and the obtained network is trained by the rich training set, In order to achieve the purpose of accurately identifying the handwritten symbols in the form,which are written in a non-standard manner with the background interference.The experiment and analysis of the algorithm are carried out by the form images, and the algorithm proposed in this paper can effectively solve a series of key problems existing in the form recognition,and by using the constructed related algorithms, accurate and efficient form image registration and the table and handwritten symbol recognition have been basically implemented, which provides a reliable basis for subsequent statistical work and lays the foundation for the mass recognition and statistic of the form.
Keywords/Search Tags:Form processing, Image registration, Table recognition, Handwritten symbol recognition, Convolutional neural network
PDF Full Text Request
Related items