Font Size: a A A

Space Tile-based Chinese Page Segmentation System

Posted on:2003-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:N YangFull Text:PDF
GTID:2168360062976448Subject:Computer applications
Abstract/Summary:PDF Full Text Request
OCR system is the Optical Character Recognition system, which has far-ranging use in many aspects such as auto-input and digital library. It is composed of two primary part-online recognition and offline recognition. The page segmentation system we have studied in this paper is a crucial part of the offline recognition. Its main function is to detect the graphic area, text area, space and etc. Page segmentation is a nodus in OCR system and is also the basis of the following steps. It places an important role in the whole system.There is an increasing number of publications, which do not have the "traditional" layout where printed regions are rectangular. Text paragraphs and areas of graphic may be of any shape, individually rotated and in any arrangement. This paper introduces a new method for the segmentation of images of document pages having both traditional and complex layouts .The underlining idea is to efficiently produce a flexible description (by means of space tiles) of the background space, which surrounds the printed regions in the page image under all the above conditions. Using this description of space, the contours of printed regions arc identified with significant accuracy. The new approach is fast as there is no need for skew detection and correction in most conditions, and only few simple operations are performed on the description of the background (not on the pixel-base data).In addition to the page segmentation step, we also introduce a skew detection method based on two vertical lines. In some cases, we will need this method as the complementary of the page segmentation.At the end of this paper, we indict the merit of this method and also point out some shortcomings of it. Finally, we give some prospect of the OCR's development in the future.
Keywords/Search Tags:OCR, Page segmentation, Smearing, Skew detection
PDF Full Text Request
Related items