Font Size: a A A

A comparison of classification error rates using pseudo-continuous variate

Posted on:1998-12-05Degree:Ph.DType:Dissertation
University:University of South CarolinaCandidate:Yoon, Stephen SeokheungFull Text:PDF
GTID:1468390014979909Subject:Biostatistics
Abstract/Summary:
In this study, various classification error rates of a dichotomized variable for potential violence from 1995 South Carolina Youth Risk Behavior Survey are investigated and compared using multiple linear regression, multiple logistic regression, two-group discriminant analysis with and without normality assumption, and classification and regression trees (CART). This multivariate survey instrument assesses the six priority health-risk behaviors of intentional and unintentional injuries, tobacco use, alcohol and other drug abuse, sexual behaviors, physical inactivity, and dietary excesses and imbalances.;Multiple linear and logistic regressions are applied to the raw data containing all variables and to the data after the number of variables has been reduced by using principal component analysis for the two methods for estimating error rates: resubstitution and training-test. The error rates for two methods are nearly the same for both regressions.;For the cases of multiple linear and logistic regressions, a theorem that the optimal cutoff value to minimize misclassification probability requires two continuous densities to be equal is presented and proved. A program to obtain the optimal cutpoint and the smallest error rate is given by an iteration technique. The results indicate that the derived cutoff value methods produce smaller classification error rates than the default methods.;The principal components are used only in discriminant analyses and CART. Only CART shows some discrepancy of the error rates between two methods of resubstitution and training-test.;For multiple linear regression classification using principal components, the error rates are compared using various subsamples of training-test methods. The error rates fluctuate among subsamples.;The bootstrap technique to estimate the standard error of the error rate is performed on multiple linear and logistic regressions, and nonparametric discriminant classifications using the training-test method in principal components data to check statistical accuracy.
Keywords/Search Tags:Error rates, Using, Linear and logistic regressions, Principal components, Multiple linear, Training-test
Related items