Clustering-based And Density Outlier Detection Method

Posted on:2015-06-12

Degree:Master

Type:Thesis

Country:China

Candidate:J Tao

Full Text:PDF

GTID:2298330422982418

Subject:Probability theory and mathematical statistics

Abstract/Summary:

Outlier detection has attracted increasing attention in statistics and machine learningareas due to its wide-ranging applications from machine fault detection, credit card frauddetection, network intrusion to stock market analysis. Outlier is a data point that notconforms to the normal points characterizing the data set.Depending on the characters of models, previous approach to outlier detection can beclassified into four broad categories: distribution-based approaches, distance-basedapproaches, density-based approaches, clustering-based approaches model-basedapproaches and etc. Despite much progress in this area, most of the existing works onoutliers detection have a certain limit. In this paper, a method for K-means clustering anddensity-based local outlier factor detection method were improved. The main results of thispaper include the follow aspects.(1) In this paper, we apply K-means clustering algorithm to divide the data set intoclusters. The points which are lying near the centroid of the cluster are rare candidate foroutliers and we can prune out such points from each cluster. Next we calculate adensity-based outlier score for remaining points. The computations needed to calculate theoutlier score reduces considerably due to the pruning of some points. depend on the outlierscore as outliers. Finally we choose LOF first n points of maximum value as outliers fromthe remaining data set. For comparison, LOF detection algorithms are used as baselines.The experimental results using synthetic and real data set demonstrate that even though thecomplexity of computation is less, the proposed method performs better than the LOFmethod.(2) As K-means clustering algorithm is sensitive to outliers, this paper applys anK-means clustering outlier detection methods with a L1.penalty parameter. Using L1. penaltyfactor can produce a sparse solution, and is relatively easy to solve. This paper aims atcombining L1.regularization method with outlier detection problem for outlier detection,finally this paper verifies the effectiveness by experiment.Finally, we points out extended directions of the two methods.

Keywords/Search Tags:

K-means clustering, LOF method, penalty factor, L1regular function, outlier

Related items

1	Research And Implementation On Outlier Detection Method Based On SOFM Clustering Algorithm
2	Variable Selection And Outlier Detection For Automated K-means Clustering
3	Optimization Of K-means Clustering Algorithm For Data With Outliers
4	Optimization Of K-MEANS Clustering Algorithm For Data With Outliers
5	Outlier Mining Method Based On Gini Indexes And Sub-space Research
6	Research And Application Of Outlier Detection Algorithm
7	Outlier Mining Algorithm Research And Application
8	Fuzzy C-means And K-means Clustering Algorithm And Its Parallel
9	Research On Application Of Collaborative Filtering Recommendation Algorithm In Educational Administration System
10	Research And Application Outlier Detection Method Based On Density&Distance