Font Size: a A A

A linear wrapper method for detection of atypical points in classification

Posted on:2006-04-10Degree:Ph.DType:Dissertation
University:Dalhousie University (Canada)Candidate:Hashemi Mohammadabad, SaeedFull Text:PDF
GTID:1458390008454926Subject:Computer Science
Abstract/Summary:
The detection of atypical data in a dataset, using a linear wrapper approach is the focus of this research. Atypical points are considered to be the misclassified points that the proposed algorithm (Atypical Sequential Removing: ASR) finds not useful to the classification task. They may include outliers and/or overlapping samples. The majority of the available atypical detection techniques apply a filter approach in which there is no requirement for the filter to be consistent with the classifier in use. The fastest available wrapper techniques, on the other hand, have a quadratic running time which is prohibitive in practice for sample subset selection. The approach presented in this research is a linear wrapper technique that, instead of using any predetermined criteria, uses only the classifier itself and a performance measure to identify atypical points in the data. As a result, it is expected to be more consistent with the classifier in use. Using a cross validation scheme, ASR manages to give a reliable test performance while identifying and ranking the atypical points in the whole dataset. To ensure that ASR does not remove informative misclassified points, Ada-boost was compared with S-boost (trained with the data without atypicals). The results showed that when a significant portion of misclassified points were removed from the training set, S-boost had a very close performance to Ada-boost. In the comparison between ASR and the Mahalanobis filter method, the results shows that ASR was more accurate in identifying atypical points, it was more consistent with the classifier in use by keeping its performance as high as the classifier with no removal from the training set, and it was able to remove 30% more points than the Mahalanobis filter. However, the assertions in the literature (removing some points from the training can enhance the performance of classifiers) were not confirmed for overall performance under the experimented linear wrapper. Experiments on 20 benchmark datasets and 7 classifiers show promising results and confirm that this linear wrapper method has some advantages and can be used for atypical detection.
Keywords/Search Tags:Linear wrapper, Atypical, Detection, Method, Consistent with the classifier, ASR
Related items