Font Size: a A A

Chinese Layout Analysis With Antecedent Non_text Regions

Posted on:2005-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:C ZhangFull Text:PDF
GTID:2168360125954769Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In Chinese documents(especially Chinese newspapers), the non_text regions and text regions always interleave each other. The nonjext regions will disturb the pick_up of the text regions. We provide a Chinese layout analysis method with antecedent non_text regions for this characteristic. First we extract the nonjext regions and remove them to avoid the disturbance of them to the pick_up of the text regions. Then we apply a method based on run_length smoothing and minimal spanning tree clustering to process the text regions. We apply different means to different aligned text regions. In the end, the text regions gained in the clustering are segmented according to the position of the non_text regions. We can infer from experiments that the mothed is better to segment the documents in which the horizontal aligned and vertical aligned text regions are blended ,and the text regions and nonjext regions intermix.
Keywords/Search Tags:Character Recognition, Layout Analysis, Run-length SmoothingAlgorithm, Minimal Spanning Tree Clustering
PDF Full Text Request
Related items