Classification in the presence of missing covariates

Posted on:2008-04-16

Degree:Ph.D

Type:Thesis

University:Carleton University (Canada)

Candidate:Montazeri-Najafabadi, Zahra

Full Text:PDF

GTID:2448390005956677

Subject:Statistics

Abstract/Summary:

The purpose of this thesis is to study the problem of classification in the presence of missing covariates and to propose methods for constructing consistent classifiers under various missing patterns. We derive representations for the best classifier when some of the covariates can be missing; this is done without imposing any assumptions on the underlying missing probability mechanism. Furthermore, without assuming any MAR-type conditions, we also construct consistent classifiers that do not require any imputation-based techniques. When the MAR assumption holds, we employ kernel-based imputation and Horvitz-Thompson-type inverse weighting approaches to handle the presence of missing covariates. The validity of our resulting classifiers is assessed via both theory and numerical examples. The thesis is organized as follows: Chapter 1 contains the basic definitions and concepts which will be needed throughout this thesis. Here, we introduce some specific notation while discussing a few preliminary results and a brief description about the literature of the problem which is to be discussed in this thesis. Chapter 2 gives two representations of the functional form of the optimal classifier when a block of covariates is missing; it also proposes consistent parametric and nonparametric classifiers. In Chapter 3 we introduce the Swiss-cheese model where missing values can be anywhere among the covariates. In this chapter we derive the best classifier and construct consistent classifiers under various conditions. Chapter 4 applies the results from Chapter 2 and 3 to find consistent classifiers based on histogram rules as well as the general partitioning rules. In Chapter 5 the least-squares approach is used to perform nonparametric classification in the presence of missing covariates. In this chapter both kernel-based imputation as well as Horvitz-Thompson-type inverse weighting approaches are employed to handle the presence of missing covariates. Using the theory of empirical processes, the performance of the resulting classifiers is assessed by obtaining exponential bounds on the deviations of their conditional misclassification errors from that of the best classifier. Chapter 6 contains some simulation studies to illustrate the proposed methods. In this chapter, we consider artificial examples as well as real data sets. Finally, Chapter 7 suggests some possible future studies.

Keywords/Search Tags:

Missing, Presence, Chapter, Classification, Consistent classifiers, Thesis

Related items

1	Research On Incomplete Data Classicifation Based On Multiple Classifiers
2	An Expectation Maximization Application For Decision Tree Classifiers On Datasets With Missing Values
3	Bayes Classifiers Research Based On The Incomplete Data
4	The modularity thesis, connectionist thesis, and apparent motion perception: An examination of two competing theories and underlying philosophical assumptions
5	Research On Multi-Dimension Bayesian Network Classifiers Based On Feature Selection
6	Research On Commit Classification Based On Combined Features And Combined Classifiers
7	Imbalanced Binary Classification On Hospital Readmission Data With Missing Value
8	Research On Missing Value Imputation Method Based On Mixed Information System
9	Research And Application On Data Mining Classification Arithmetic Based On Multiple Classifiers Fusion
10	Features and statistical classifiers for face image analysis