Font Size: a A A

A New Technique Of Outlier Detection

Posted on:2013-02-21Degree:MasterType:Thesis
Institution:UniversityCandidate:Nassir Abdullah Nassir N X EFull Text:PDF
GTID:2248330374489577Subject:Computer Science
Abstract/Summary:PDF Full Text Request
Outlier detection is an important research problem in data mining that aims to discover useful abnormal and irregular patterns hidden in large data sets. Outliers arise due to mechanical faults, changes in system behavior, fraudulent behavior, network intrusions or human errors. Outlier detection has attracted attention in a variety of application domains such as credit card, insurance, tax fraud detection, intrusion detection for cyber security and many other areas.Many data-mining techniques consider finding outliers as only a side-product of clustering operations. Generally, these techniques define outliers as points which do not lie in clusters. Thus, the main concern of clustering-based outlier detection algorithms is to find clusters and outliers which are often regarded as noise that should be removed in order to make more reliable clustering.In the actual dissertation we first started by presenting a theoretical overview of outlier detection methods and data mining techniques. The new method provides an efficient outlier detection and data clustering capabilities is proposed. Our algorithm in based on the idea of filtering the data after executing the clustering process. The proposed method has two main processing stages. The first stage, which is the clustering process, implements the k-means process. The second stage is a filtering stage that aims at removing outliers which are far away from their cluster centroids. The removal decision is made based on a chosen threshold.To check the efficiency and usefulness of our algorithm, we carried out some experimental results on the dataset used by the KDD Cup1999contest. The empirical results indicate that the proposed method successfully detected the intrusions data present in the data which indicates that our method can be promising in practice. Furthermore, we also compared our algorithm against some existing methods on the top of the KDD dataset and got better outlier removal performance.
Keywords/Search Tags:Technique
PDF Full Text Request
Related items