Font Size: a A A

Nonparametric multivariate outlier detection methods, with applications

Posted on:2006-03-26Degree:Ph.DType:Dissertation
University:The University of Texas at DallasCandidate:Dang, XinFull Text:PDF
GTID:1458390008463712Subject:Statistics
Abstract/Summary:
The literature on multivariate outlier identification largely focuses on employing a type of robust Mahalanobis distance to classify outliers. However, this requires a strong assumption: the underlying model is normal, or at least elliptically symmetric, which in many situations cannot realistically be assumed. To pass beyond elliptical models, we introduce a nonparametric approach using depth functions to obtain inner regions that can in general follow the actual geometric structure and shape of the given data. This is a nonparametric outlier identification approach.; In this dissertation, depth-based outlier regions are constructed. To study robustness of these nonparametric outlier identifiers, notions of masking and swamping breakdown points are formulated, and general properties are derived under natural assumptions on the depth function. An interesting feature, for example, is that such outlier identifiers are able to handle the masking effects of extreme outliers more easily than those of less extreme ones, in contrast with the robustness of statistical estimators, which perform worse in the presence of more extreme outliers. Particular results are obtained for the familiar halfspace, simplicial, spatial, and projection depth functions.; The second major robustness property is studied by the influence function analysis. The influence functions are obtained for the spatial, simplicial and generalized Tukey depth functions. Comparison of these influence functions with those of other depth functions is carried out and illustrated.; It is well known that if an outlier has one or more missing component values, imputation methods will tend to impute non-extreme values and make the outlier become less extreme and less likely to be detected. In this dissertation, as a major application, our nonparametric depth-based outlier detectors are used as criteria in a study comparing several established methods of imputation of missing data, for actual clinical laboratory data sets. Two kinds of criteria based on outlyingness measures are developed. One is called "outlier recovery", the other is a "relative accuracy measure". Three other outlier identifiers based on Mahalanobis distance, robust Mahalanobis distance and generalized PCA are also included in the study. Consequently, not only the comparison of the imputation methods, but also the comparison of the outlier detection methods can be carried out in this study.
Keywords/Search Tags:Outlier, Methods, Nonparametric, Mahalanobis distance, Depth functions
Related items