Chinese Layout Analysis With Antecedent Non_text Regions

Posted on:2005-04-17

Degree:Master

Type:Thesis

Country:China

Candidate:C Zhang

Full Text:PDF

GTID:2168360125954769

Subject:Computer application technology

Abstract/Summary:

In Chinese documents(especially Chinese newspapers), the non_text regions and text regions always interleave each other. The nonjext regions will disturb the pick_up of the text regions. We provide a Chinese layout analysis method with antecedent non_text regions for this characteristic. First we extract the nonjext regions and remove them to avoid the disturbance of them to the pick_up of the text regions. Then we apply a method based on run_length smoothing and minimal spanning tree clustering to process the text regions. We apply different means to different aligned text regions. In the end, the text regions gained in the clustering are segmented according to the position of the non_text regions. We can infer from experiments that the mothed is better to segment the documents in which the horizontal aligned and vertical aligned text regions are blended ,and the text regions and nonjext regions intermix.

Keywords/Search Tags:

Character Recognition, Layout Analysis, Run-length SmoothingAlgorithm, Minimal Spanning Tree Clustering

Related items

1	Document Layout Analysis Based On Neighborhood Features
2	Research On Clustering Algorithm Based On Minimum Spanning Tree
3	A Study On Layout Analysis Method Of Complex Structure Name Cards Recognition
4	Support Vector Clustering Method And Its Applications To Biomedical Datasets
5	Research Of Clustering Algorithms Based On Minimum Spanning Tree
6	Research On Clustering Algorithm Based On Minimum Spanning Tree
7	A Study On Chinese Document Layout Analysis And Reconstruction
8	A Hybird Framework For Physics Problems Recognition
9	Application Of The Clustering Analysis In The Large Vocabulary Chinese Character Recognition
10	A Study On Chinese Document Layout Analysis