A Semi Supervised Framework for Handwritten Document Analysis

Posted on:2015-12-27

Degree:Ph.D

Type:Dissertation

University:State University of New York at Buffalo

Candidate:Porwal, Utkarsh

Full Text:PDF

GTID:1478390020951145

Subject:Computer Science

Abstract/Summary:

PDF Full Text Request

We have investigated machine learning approaches for various tasks in handwritten document analysis such as writer identification and text recognition. Most of the techniques explored by the community thus far fall into the category of supervised learning approaches. Here a large number of labeled data samples are required for training, which is unrealistic in many handwritten document analysis applications. Moreover, these techniques often rely on hand crafted feature design, which fails to capture latent yet statically significant information present in the data. Another challenge includes the selection of a suitable model for the learning algorithms that inherently have different inductive biases and computational complexities.;We have developed semi-supervised approaches suitable for the handwritten document analysis tasks identified and extensible to additional applications. These approaches address the aforementioned issues by making three important contributions as follows: (i) We observe that while there is typically a paucity of labeled data, unlabeled data on the other hand is available in plenty that can be leveraged to improve the machine learning algorithms. We propose a co-training based approach that builds on this observation by utilizing large quantities of unlabeled data. This approach leads to state-of-the-art performance in writer identification where we achieve a high precision in labeling unlabeled documents. (ii) We have captured latent information present in handwritten document collections without any human feedback or additional labeling. To this end, we have outlined a completely semi-supervised method for learning domain specific contextual information by dividing the main task into several related sub tasks to extract the information that is otherwise difficult to obtain. To the best of our knowledge, this is the first attempt to use structural learning framework for handwritten document analysis. (iii) Towards selecting the optimal model we have investigated new ensemble methods to overcome the limitations of single learner. To account for presence of error in learning process, an error-correcting code based technique is proposed that uncovers the correct class information by injecting redundant information in the learning process.;We have demonstrated the efficacy of the proposed solution over publicly available IAM English database, handwritten Arabic PAW database and IBM-UB database of the task of writer identification and handwritten text recognition. Our methods achieve superior performance on all these datasets over purely supervised approaches hence underscore the significance of semi-supervised approaches for various handwritten document analysis tasks.

Keywords/Search Tags:

Handwritten document analysis, Approaches, Supervised, Writer identification, Machine learning

PDF Full Text Request

Related items

1	Writer identification of Arabic handwritten documents
2	On-line Handwritten Chinese Characters Analysis And Recognition Based On Deep Learning
3	Signature Verification And Writer Identification Based On Deep Learning And Domain Knowledge
4	Topic Modeling Approaches For Supervised Document Classification
5	Offline Handwritten Document Recognition System For Mobile Platforms
6	Document Analysis Of The Financial Bill And Recognition Of Handwritten Digit
7	Machine learning approaches for dealing with limited bilingual training data in statistical machine translation
8	Deep Neural Networks Based Offline Writer Identification
9	Study On Algrithoms Of Offline Handwritten Chinese Identification And Recognition
10	Machine learning for person identification with applications in forensic document analysis