Font Size: a A A

The Key Technology Research On Chinese Layout Anaysis

Posted on:2008-02-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:C JinFull Text:PDF
GTID:1118360215498565Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Layout analysis is an important part in document layout analysis and understanding. Itis used to transfer content in paper document to electronic digital information for furtherdigitalization of total layout. Out of different kinds of document layouts, Chinesedocument layout is with diversified composition and complicated Chinese characters. Thismakes it more difficult in analyzing Chinese document layout than the layout of otheralphabetic languages. It has been a bottleneck in development of layout analysistechnology currently. Thus, the study of layout analysis is of important theoreticalsignificance and application value.Because of the complex of layout, the scope of study object for layout analysis isextremely wide. Different kind of layout refers to different information, which needsdifferent processing method in layout analysis. A number of key technologies of Chineselayout analysis were studied and presented in this dissertation, which are skew detectionand correction, block segmentation and recognition, determination of logical order inlayout and table recognition. The innovational achievements involved these researches areas follows,1 layout skew detection algorithm based on window transformThe scanned layout is with inevitable skew which would cause negative affect onfollow-up processing. A proper window is selected in this algorithm for skew detection andcorrection. The skew detection is achieved by conducting varied resolution processing fordetail content in the window and line fitting of those extracted characteristic points.Experimental results show that this algorithm is with good adaptability and can detect theskew of different layout rapidly and accurately2 layout skew detection algorithm based on edge enhancementConsidering the influence of complicated layout on the efficiency of window selection,another layout skew detection algorithm is put forwards based on edge enhancement. Inthis algorithm, an image block is obtained from processing image by operator. The originaledge information is represented by that of the image block. A 4-direction chain code isused to stand for the edge of this image block. Then approximate line information can beextracted from the image block. Skew angle is calculated by least squares algorithm at last.Experimental results show that this algorithm is accurate and rapid, and independent of thecontent of layout. 3 layout segmentation and block recognition algorithm based on hierarchy extractionLayout segmentation and block recognition is to divide layout into differentgeometrical zones and generates different blocks with different types of data. Firstly, thelayout is segmented into different levels of image, figure and text. The main line segmentis extracted from image level and figure level by mathematical morphology. The textlevel is analyzed by connectivity. Figure, table and text are discriminated by text blurring,edge detecting, paragraph extracting, project periodicity estimating. Layout segmentationand block recognition is combined in this algorithm which improves the processingefficiency.4 determination of logical order in layout based on directed graph.Space structure directed graph is set up from analysis the space structure of layoutobjects. This transfers the determination of logical order of layout objects into traversingsearch in directed graphs, from which the logical order of layout object is determined. Theefficiency of this method was proved by experiments.5 a table recognizing algorithm based on directed graphTable model is established by extracting characteristics and attribute of empty table.Feature extraction is conducted for the table under recognizing. Table recognition isachieved by logical relationship and two stage matching which makes use of the matchingsimilarity of feature line between model and the under recognizing table. Thus theaccuracy of recognizing is improved. Experimental results show that this algorithm isflexible and efficiency.Finally an experimental system for analysis bill layout is established to valid abovealgorithms, such as skew detection and correction, layout segmentation and blockrecognition, determination of logical order in layout and table recognizing algorithm.Experiment results illustrate that these algorithms are effective and universal in analyzingthe image of bill.
Keywords/Search Tags:document image processing, layout anaysis, skew detection, form recognition
PDF Full Text Request
Related items