Using principal component analysis (PCA) to obtain auxiliary variables for missing data in large data sets

Posted on:2013-02-04

Degree:Ph.D

Type:Dissertation

University:University of Kansas

Candidate:Howard, Waylon J

Full Text:PDF

GTID:1458390008467769

Subject:Psychology

Abstract/Summary:

The purpose of this dissertation is to address an important issue in the imputation of missing data in large data sets. The issue can arise in any analysis in which auxiliary variables are used to inform a modern missing data handling procedure (e.g., FIML, MI) to support the missing at random assumption, reduce bias and decrease standard errors. The problem is that researchers suggest an "inclusive strategy" where as many auxiliary variables are included as possible. However, the model becomes more complex with the addition of each additional auxiliary variable, so there is a practical limit to the number of auxiliary variables that can be successfully included. Beyond this limit, the model will fail to converge. Large data projects can present a challenge because it is possible to have hundreds of potential auxiliary variables to inform the missing data handling procedure, especially when non-linear information is included. The dissertation is divided into the following sections: 1) a brief discussion of the issue of missing data; 2) a review of the history of missing data including theory and existing solutions regarding handling missingness; 3) an assessment of the use of auxiliary variables in missing data handling; 4) a discussion of convergence failure with modern missing data methods; 5) a basic introduction to principal component analysis; 6) the introduction of an alternative strategy to address the large number of auxiliary variables issue; and finally, 7) a demonstration of the potential of the principal component scores as auxiliary variables approach by applying it to the analysis of simulated and empirical data.

Keywords/Search Tags:

Data, Auxiliary variables, Principal component, Issue

Related items

1	On using block principal component analysis for reducing gene-expression data dimensions
2	Computer-based process monitoring/fault detection using principal component analysis
3	Construction Method Of Principal Component Networks And Its Application
4	Application Of Principal Component Analysis And Clustering In Science And Technology Data Analysis
5	Study On Emotional Event-related Potential By Eeg Data
6	Study On Emotional Event-related Potential By EEG Data
7	Research On Face Recognition Algorithm Based On Principal Component Analysis
8	Comparative Study On Sparse Principal Component Analysis
9	Research On Feature Extraction Based On Principal Component Analysis
10	Principal component analyses for tree structured objects