Font Size: a A A

Clustering and Segmentation with Application in Document Image Processin

Posted on:2018-03-07Degree:Ph.DType:Dissertation
University:Purdue UniversityCandidate:Xue, HaitaoFull Text:PDF
GTID:1448390002998019Subject:Electrical engineering
Abstract/Summary:
In this dissertation, we introduce a set of algorithms for document image process- ing, which are in the research area of color clustering and binarization.;Color quantization algorithms are used to select a small number of colors that can accurately represent the content of a particular image. In this research, we introduce a novel color quantization algorithm which is based on the minimization of a modified Lp norm rather than the more traditional L2 norm associated with mean square error (MSE) [1]. We demonstrate that the Lp optimization approach has two advantages. First, it produces more accurate perceived quality results, especially for important colors in small regions; and second, the norm's value can be used as an effective criterion for selecting the minimum number of colors necessary to achieve accurate representation of the image.;Binarization algorithms are used to create a binary representation of a raster document image, typically with the intent of identifying text and separating it from background content. In this work, we propose a binarization algorithm via one-pass local classification [2]. The algorithm first generates the initial binarization results by local thresholding, then corrects the results using a one-pass local classification strategy, followed by the process of component inversion. The experimental results demonstrate that our algorithm achieves a much lower binarization error rate than other popular binarization/thresholding algorithms. It is also demonstrated that the proposed algorithm achieves a somewhat lower binarization error rate than the state-of-the-art algorithm COS [3], while requiring significantly less computation.
Keywords/Search Tags:Document image, Algorithm, Binarization
Related items