Font Size: a A A

On The Detection Method Of Multidimensional Outliers

Posted on:2012-11-30Degree:MasterType:Thesis
Country:ChinaCandidate:G R ZhangFull Text:PDF
GTID:2230330362453135Subject:Humanities and sociology
Abstract/Summary:PDF Full Text Request
Solving the statistical problem is based on the statistical data, but in the process of statistical analysis, some outliers of the statistical data will make statistical results produce some unexpected deviations, so we need to find a method to detect outliers.In previous studies, detecting outliers remained mostly the stage of detecting the one-dimensional data. But with the continuous development of statistical techniques, the application of multivariable statistics gradually gained popularity, and indeed many practical problems must be resolved by using multivariate statistical methods, however in multivariate analysis, data have been more than one-dimensional data, but rather extended to multidimensional data, and according to the different purposes of the statistics, the dimension is not the same. In the statistical analysis of multidimensional data, outliers will affect statistical results and make them deviate from the actual results, and this need to find methods to detect outliers from multidimensional data, and before carrying on the multidimensional statistical analysis, we should detect and process outliers.In order to explore methods of solving these problems and find effective methods to detect outliers, based on the analysis of previous research methods, the paper uses the document method, comparative analysis method and exploratory experiment method, and try to find a universal detection method of multidimensional outliers.To detecting multidimensional outliers, the paper attempts to use three methods (direct decomposition method, local average distance method and multidimensional scaling method) to make multidimensional data mapped to one dimension, and this can use mature detection method of one-dimensional outliers to detect multidimensional outliers.By the exploratory experiment method, the author verified above three methods, and the results show that the multidimensional scaling method can not correctly detect outliers and only detect some edge data; the direct decomposition method can only correctly detect part of outliers and need a little computation; the local average distance method can correctly detect all outliers, but need a large computation.Taking all experiment results, the author can firstly use the direct decomposition method to detect and remove part of outliers, then use the local average distance method to detect the remaining outliers, to reduce the computation and accurately detect all outliers. In the experimental validation of detection methods, the author finds only the accuracy of the two-dimensional data can be observed by the visual map, but the accuracy of higher-dimensional data can not be observed by the visual map, so the study need to continue to develop.
Keywords/Search Tags:Multidimensional data, Outlier, Detection method
PDF Full Text Request
Related items