Font Size: a A A

Exploring a space of document image classifiers

Posted on:2006-12-17Degree:M.ScType:Thesis
University:Queen's University at Kingston (Canada)Candidate:Chen, NaweiFull Text:PDF
GTID:2458390008961301Subject:Computer Science
Abstract/Summary:
Document image classification is an important step in Office Automation, Digital Libraries and other document image analysis applications. There is great diversity in document image classification systems: they differ in the problems they solve, in the use of training data to construct models of document classes, and in the choice of document features and recognition algorithms. We identify important issues in classification problem definition, classifier design and performance evaluation. To explore a space of existing document image classifiers and investigate the issues, we build a prototype, SEDIC (a System for Exploration of Document Image Classifiers). We conduct experiments, applying SEDIC to publicly available data sets as well as to our own data set. We demonstrate that SEDIC can be used to compare the performance of classifiers that differ in their choice of image features and algorithms. According to our experiments, the best-performing feature set and algorithm vary with the classification problem, with no single feature set or algorithm emerging as an overall winner in all situations.
Keywords/Search Tags:Document image, Classification, Classifiers
Related items