Font Size: a A A

Minimum sample sizes for two-group linear and quadratic discriminant analysis with rare population

Posted on:2010-01-08Degree:Ph.DType:Dissertation
University:University of Northern ColoradoCandidate:Zavorka, Shannon WilliamsFull Text:PDF
GTID:1448390002977392Subject:Statistics
Abstract/Summary:
The purpose of this study was to investigate the performance of Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA) with regards to rare populations. This study provides minimum sample size recommendations for performing a two-group linear and quadratic discriminant analysis for rare groups (at most 15% of the population) under a variety of conditions. Sample size recommendations were determined by conducting a series of Monte Carlo simulations using the SAS systems software. For each simulation, data were generated from two multivariate normal distributions. Using the data, LDA (or QDA) was performed using the L-O-O procedure and sensitivity (Sn) and specificity (Sp) values were determined. This process repeated 2000 times and the sample size increased until 95% of 2000 sensitivity and specificity values were at least the specified minimum. The minimum sensitivity and specificity levels used in this study were Sn=.85/Sp=.70, Sn=.75/Sp=.65, and Sn=.70/Sp=.60 and the three rarity levels used were 5%, 10%, and 15% of the population. Several conclusions regarding sample size were drawn from the data. First, the greater the separation between the groups, the smaller the needed sample size, and second, as the number of predictors, k, increases, the required sample size increases. Also, as sensitivity and specificity values increase, or as the rarity of a group increases, required sample sizes also increase. For maximum group overlap, as the correlation between the prediction variables increased, so did the required sample size. Conversely, for minimum group overlap, larger correlation values resulted in smaller required sample sizes. The recommended minimum sample sizes for the scenarios examined in this study range from six to more than a thousand. General sample size recommendations are presented for various rarity levels, sensitivity and specificity levels, group separation distances, predictor variable correlation values, and for maximum and minimum overlap between the two groups.
Keywords/Search Tags:Quadratic discriminant analysis, Sample size, Minimum, Linear, Specificity, Values, Rare, Levels
Related items