Font Size: a A A

Factor analysis: The effects of distribution type, number of factors, factor loadings, number of variables per factor and sample size on the rules used to determine the number of factors to retain

Posted on:2003-02-16Degree:Ph.DType:Dissertation
University:Boston UniversityCandidate:Dukes, Kimberly AnnFull Text:PDF
GTID:1460390011989870Subject:Statistics
Abstract/Summary:
The primary goal of factor analysis (FA) is to understand the underlying structure or covariation in a set of distinct items. There are several steps involved in performing FA. Here we focus on the most important step, determining the appropriate number of factors to retain.;Based on the FA population model, we used Monte Carlo simulation to estimate the accuracy of twelve popular rules. Six component analysis (CA)-based rules were also tested including lambda ≥ 1, minimum average partial method, Bartlett's chi2 test, Horn's Parallel Analysis (PA), 80% explained variance and Image lambda ≥ 1. Six factor analysis (FA)-based rules were tested including Kaiser-Guttman's eigenvalue (% variance explained) lambda ≥ 0 and lambda ≥ 1, Cureton and D'Agostino's lambda ≥ (n0.6)/15 (where n = the number of variables), lambda ≥ mean squared multiple correlation (SMC), 100% explained variance and a maximum likelihood based chi2 test. Five conditions of simulation were investigated: distribution of the items (normal, ordinal, binomial), the true number of factors (2 to 10 by 2), strengths of the factor loadings (0.3 to 0.9 by 0.1), the number of variables loading on each factor (3, 4, 5, 6, 9, 12) and the number of observations per variable (5, 10, 20, 30).;The results indicated that among the CA-based rules, PA produced the highest accuracy (more than 87% of the time the true number of factors was retained) and among the FA-based rules, the lambda ≥ (n0.6)/15 rule produced the highest accuracy (>77%) over all conditions of simulation. The default rules in popular statistical computing packages (e.g., SAS, S-Plus) are not the most accurate. The lambda ≥ 1 CA-based rule and the 100% variance FA-based rule were <57% and <48% accurate, respectively, over all conditions of simulation. The distribution of the items had minimal effect on rules' performance. The magnitude of the factor loadings generally had the most impact on the accuracy of the rules. The remaining conditions of simulation have less clear effects on rules' performance. Optimal results are achieved with PA, assuming adequate computer resources, and as an alternative the lambda ≥ (n0.6)/15 is recommended. Best results are achieved with factor loadings of 0.5 or greater and at least ten observations per variable.
Keywords/Search Tags:Factor, Rules, Per, Distribution, Variables
Related items