Font Size: a A A

Optimization Of K-MEANS Clustering Algorithm For Data With Outliers

Posted on:2010-01-29Degree:MasterType:Thesis
Country:ChinaCandidate:A JiaoFull Text:PDF
GTID:2178330332488536Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The K-MEANS clustering algorithm is a widely used simple iterative method to partition a given dataset into a user-specified number of clusters, k. Its practical value and importance have been acknowledged across different disciplines. The traditional K-MEANS take Euclidean distances of each observation as the measurement, and error square sum the objective function. An outlier is such an extreme observation that is numerically distant from the mean of data sample, which causes all the statistical tests based on mean and variance to distort to some extent. However, a small number of outliers not due to any anomalous condition are to be expected in large samples. Thus K-MEANS inevitably is impacted by the existence of outliers.This paper researches on the algorithm and measurement of K-MEANS, then proposes an optimization of K-MEANS algorithm based on outlier deletion. The main point is that the defect that K-MEANS would fall into a suboptimization is made advantage in our algorithm. Under the strategy of cluster-based outlier detection, outliers are searched and deleted in clusters. The notion of entropy and balance are invited as a condition to end the clustering process. To avoid K-MEANS from falling into certain suboptimization and ceasing searching for outliers, a mechanism of stimulation is introduced. The number of clusters, k, is changing during the deletion process, following a certain curve. The aim of changing k is to kick iterative process out of the suboptimization which as a result would help continue the outlier deletion process. Thus outliers would be searched and deleted as much as possible. The ability to find cluster centers and cluster correctly is raised effectively after decreasing the influence outliers have on K-MEANS algorithm.
Keywords/Search Tags:K-MEANS clustering algorithm, Outlier, Cluster-Based Outlier Detection, Entropy
PDF Full Text Request
Related items