Font Size: a A A

Distorted Document Skew Correction Based On Point Cloud Data

Posted on:2017-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:J C ZhengFull Text:PDF
GTID:2308330482988309Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Current electronic process of paper documents is mainly based on two-dimension (2D) document image. Document image skew correction is an important step to improve document recognition rate of optical character recognition (OCR). The current method of document image skew correction including the following methods: Hough transform method, projection method; connected domain method, Radon transform method and so on. If the document is distorted and tilted, these methods can not have a better effect because of the distorted document text line. At the same time these methods also can not have a better effect on graphics and text mixed image, double page document image and so on. Lastly, the rate of OCR recognition will be reduced.This thesis proposed an algorithm of rectifying the tilt of distorted document based on point cloud data. Firstly,3D scanner is used to collect the point cloud data and the texture data of document; then the architectural feature of point cloud model of document and location feature in space are analyzed and divided into space tilt and plane tilt based on the situation of distribution. Tilt correction processing is divided into space and plane tilt correction. Secondly, in the step of space tilt correction, several boundary point of document model should be collected; next, the iteration method named Newell is used to fitting a space plane, and obtain the normal vector of the flat surface in which the bottom of point cloud model is located, then the normal vector is crossly operated with the Z axis to get the rotation vector; lastly, the space affine transformation algorithm is used to rectify its space tilt. Thirdly, in the step of rectifying plane tilt, owing to the finish of rectification of space tilt, the base plane of the model is paralleled with the x-y plane now, the document model is projected in the Z=0 plane; then several boundary points are collected; next, the linear fitting algorithm based on least square method is used to obtain the straight line equation of the model boundary and the algorithm based two-dimension coordinate rotation is used to rotate the projection of the document model by the plane tilt angle obtained through the slope of the boundary line; lastly, the X and Y values of the document model are replaced by the X and Y values of the rotated projection. The amount of calculation is reduced through the projection and dimension reduction.Finally, a software system was realized through the algorithm above and some test on skew document model was done. Through the data of experiments, it is proved that the algorithm can effectively check out and rectify the skewed and distorted document in the scope of error permitted and can effectively decrease the influence of recognition rate of OCR caused by document skew in the process of rectification of distorted document.
Keywords/Search Tags:skew correction, point cloud data, distorted document, space tilt, plane tilt, OCR
PDF Full Text Request
Related items