Optimization Of K-MEANS Clustering Algorithm For Data With Outliers

Posted on:2010-01-29

Degree:Master

Type:Thesis

Country:China

Candidate:A Jiao

Full Text:PDF

GTID:2178330332488536

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

The K-MEANS clustering algorithm is a widely used simple iterative method to partition a given dataset into a user-specified number of clusters, k. Its practical value and importance have been acknowledged across different disciplines. The traditional K-MEANS take Euclidean distances of each observation as the measurement, and error square sum the objective function. An outlier is such an extreme observation that is numerically distant from the mean of data sample, which causes all the statistical tests based on mean and variance to distort to some extent. However, a small number of outliers not due to any anomalous condition are to be expected in large samples. Thus K-MEANS inevitably is impacted by the existence of outliers.This paper researches on the algorithm and measurement of K-MEANS, then proposes an optimization of K-MEANS algorithm based on outlier deletion. The main point is that the defect that K-MEANS would fall into a suboptimization is made advantage in our algorithm. Under the strategy of cluster-based outlier detection, outliers are searched and deleted in clusters. The notion of entropy and balance are invited as a condition to end the clustering process. To avoid K-MEANS from falling into certain suboptimization and ceasing searching for outliers, a mechanism of stimulation is introduced. The number of clusters, k, is changing during the deletion process, following a certain curve. The aim of changing k is to kick iterative process out of the suboptimization which as a result would help continue the outlier deletion process. Thus outliers would be searched and deleted as much as possible. The ability to find cluster centers and cluster correctly is raised effectively after decreasing the influence outliers have on K-MEANS algorithm.

Keywords/Search Tags:

K-MEANS clustering algorithm, Outlier, Cluster-Based Outlier Detection, Entropy

PDF Full Text Request

Related items

1	Optimization Of K-MEANS Clustering Algorithm For Data With Outliers
2	Research And Application Of Outlier Detection Algorithm
3	Research And Implementation On Outlier Detection Method Based On SOFM Clustering Algorithm
4	Research On Outlier Detection Based On Density Difference
5	Study Of Parameterless Outlier Detection And Complex-manifold Clustering Algorithm
6	Design And Implementation Of Outlier Detection Algorithms Based On K-means Clustering
7	Study On An Analysis Method For Cluster-based Outlier
8	Based On Information Entropy And The Subspace Outlier Mining Algorithm
9	Variable Selection And Outlier Detection For Automated K-means Clustering
10	Clustering-based And Density Outlier Detection Method