Font Size: a A A

Detection of multivariate mean vector and covariance matrix outliers in behavioral sciences data

Posted on:2006-01-28Degree:Ph.DType:Thesis
University:The University of North Carolina at Chapel HillCandidate:Carrig, Madeline MarieFull Text:PDF
GTID:2458390008950800Subject:Psychology
Abstract/Summary:
The present investigation was a Monte Carlo experiment focused on the evaluation of discordancy tests for the detection of outlier observations drawn from a population having a mean vector and/or covariance matrix that differ from those of the inlier population. A critical review and synthesis of previous theoretical and empirical findings was undertaken. A primary aim was the examination of outlier detection methods under a range of conditions typical of social and behavioral sciences research.; Statistics under investigation included the Mahalanobis D 2, multivariate kurtosis statistic (Mardia, 1970, 1974), hat value, MCD robust distance, SHV robust distance (Egan & Morgan, 1998), Bacon MLD (Bacon, 1995), two local influence approaches (Poon & Poon, 2002), and a new metric proposed by the author. Metrics were applied using a statistical criterion (with critical values derived within a preliminary simulation study), a "natural drop" approach, and a five percent trim approach. Performance criteria included hit rate, false alarm rate, mean vector bias, covariance matrix bias, and frequency of computationally problematic "outlier" identifications. Experimental factors included sample size, sample dimensionality, degree of inlier population non-normality, outlier fraction, outlier separation, and outlier type.; Results indicated that use of the trim approach tended to optimize false alarm rate and bias performance outcomes, whereas adoption of the natural drop technique most frequently maximized metric ability to successfully identify outlying observations. No single outlier detection metric or combination of metrics proved consistently superior in terms of rates of accurate outlier identification or ability to minimize mischaracterization of sample estimates. A wide range of performances was observed, with mean (and even optimum) outcomes falling far short of ideal, and with some performance outcomes falling short even of those obtained from a random outlier selection process. Findings suggest that the applied researcher should strive to ensure that the particular outlier detection metric and approach to implementation selected are consistent with his/her initial aims in investigating the possibility of outliers. Especially if the investigator's goal is to fit a substantive model to an observed data sample, the choice of whether to reject, downweight, accommodate, or retain metric-identified outliers should be made with caution.
Keywords/Search Tags:Outlier, Detection, Covariance matrix, Mean vector, Metric, Sample
Related items