Font Size: a A A

Clustering-based And Density Outlier Detection Method

Posted on:2015-06-12Degree:MasterType:Thesis
Country:ChinaCandidate:J TaoFull Text:PDF
GTID:2298330422982418Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Outlier detection has attracted increasing attention in statistics and machine learningareas due to its wide-ranging applications from machine fault detection, credit card frauddetection, network intrusion to stock market analysis. Outlier is a data point that notconforms to the normal points characterizing the data set.Depending on the characters of models, previous approach to outlier detection can beclassified into four broad categories: distribution-based approaches, distance-basedapproaches, density-based approaches, clustering-based approaches model-basedapproaches and etc. Despite much progress in this area, most of the existing works onoutliers detection have a certain limit. In this paper, a method for K-means clustering anddensity-based local outlier factor detection method were improved. The main results of thispaper include the follow aspects.(1) In this paper, we apply K-means clustering algorithm to divide the data set intoclusters. The points which are lying near the centroid of the cluster are rare candidate foroutliers and we can prune out such points from each cluster. Next we calculate adensity-based outlier score for remaining points. The computations needed to calculate theoutlier score reduces considerably due to the pruning of some points. depend on the outlierscore as outliers. Finally we choose LOF first n points of maximum value as outliers fromthe remaining data set. For comparison, LOF detection algorithms are used as baselines.The experimental results using synthetic and real data set demonstrate that even though thecomplexity of computation is less, the proposed method performs better than the LOFmethod.(2) As K-means clustering algorithm is sensitive to outliers, this paper applys anK-means clustering outlier detection methods with a L1.penalty parameter. Using L1. penaltyfactor can produce a sparse solution, and is relatively easy to solve. This paper aims atcombining L1.regularization method with outlier detection problem for outlier detection,finally this paper verifies the effectiveness by experiment.Finally, we points out extended directions of the two methods.
Keywords/Search Tags:K-means clustering, LOF method, penalty factor, L1regular function, outlier
PDF Full Text Request
Related items