Research And Realization Of Documents' Images Recognition Algorithm

Posted on:2010-11-26

Degree:Master

Type:Thesis

Country:China

Candidate:X M Zhou

Full Text:PDF

GTID:2178360275486564

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

In this thesis, a document image processing system is designed and performed. It can be applied for language identification. The system consists of image preprocessing, layout analysis, and language identification. The main contributions of this dissertation include:(1)Back Ground. The image information is the important knowledge source which the human being knows the world. The foreign scholars have made a statistic that the 70% outside information comes form the images which people get by the eyes. People's research object is extended from imitate area to digital area. The concept of digital image is existed.(2)Image Denoise. A prime entropy threshold two-value algorithm is improved on by grads adjustment, which is proved avoiding losing the information of image fringe and reducing form lines part. Secondly, owing to the form image skew, a skew detection and correction based on the directional single-connected chain is adopted.(3) Image preprocessing. In order to detect and correct the sloppy angles of an image, a method based on Hough transform is presented. To reduce the computation of Hough transform, it is modified in the following ways: an appropriate quantitative angle step is taken to decrease number of angles: a sub region other than the whole image is used to reduce the data to be processed; "featured pixels" are extracted to reduce the data further. To improve the effect of image rectification, the area that the original pixels occupy are used to carry out the interpolation of after rotation—blank pixels.(4) Layout analysis. Layout segmentation and block recognition is to divide layout into different geometrical zones and generates different blocks with different types of data. Firstly, the layout is segmented into different levels of image, figure and text. The main line segment is extracted from image level and figure level by mathematical morphology. The text level is analyzed by connectivity. Figure, table and text are discriminated by text blurring, edge detecting, paragraph extracting, project periodicity estimating. Layout segmentation and block recognition is combined in this algorithm which improves the processing efficiency.

Keywords/Search Tags:

Document Image Processing, edge test, seed fill, Layout Analysis, Skew Correction

PDF Full Text Request

Related items

1	The Key Technology Research Of Document Image Layout Analysis
2	Research On Skew Correction And Denoise For Document Image
3	A Study On Chinese Document Layout Analysis And Reconstruction
4	A Study On Chinese Document Layout Analysis
5	Study On Document Image Layout Understanding
6	A Binary Document Image High Compression Ratio Compression Algorithm
7	Pre-processing System And Algorithm Research Of Digital Information Card Document Image
8	The Study On Obtaining Chain Code And Document Layout Analysis
9	Research On Algorithms Of Document Image Processing And Form Image Identification
10	Research Of Skew Correction Algorithm For The Complex Document Images