| Data Mining, also called as knowledge discovery of databases (KDD), finds out connotative, unknown and potentially valuable knowledge and rules. As a rising crossover subject, data mining involves an integration of techniques from multiple disciplines such as machine learning, pattern recognition, database technology, statistics and artificial intelligence. Cluster analysis can be used not only as a separate technique to discover the information about data distribution, but also as the preprocessing of other data mining operations, therefore it is very meaningful to study how to boost the performance of clustering algorithm.The K-means clustering algorithm is a typical partition method, for it is easy to achieved, scalable and high efficient for disposing big data set. However, this algorithm requires the user to give the number of clusters beforehand. It is very sensitive to initial conditions, often gets trapped in local minimum and has only the best capability to capture clusters in hyperspherical shape.Mainly aiming at the dependency to initial conditions and limitation of K-means algorithm that applies the sum of the squared Euclidean distances between each data point xk and the centroid mj of the subset which contains xk as clustering criterion, a novel Multiseed Clustering Algorithm based on Max-min Distance Algorithm (MCAMDA) is proposed in this paper. Joining together with multiple sampling technique, MCAMDA applies max-min distance algorithm for the second time on the aggregation of candidate seeds to search optimal initial clustering seeds, which consequently avoids random of selecting initial seeds to a large extent. It is also able to minimize the knowledge of input parameter, that is, the number of clusters needn't to be supplied.Different from K-means type algorithm, MCAMDA assigns multiple seeds to a cluster, for an elongated or large cluster can be considered as the union of a few small distinct hyperspherical clusters. The big clusters are temporarily divided into some small ones which then will be combined into final ones by using the basic idea of DBSCAN. In order to test the clustering performance of MCAMDA, this paper presents several experiments which all show that the improved algorithm can lead to better and more stable solutions than K-means algorithm.Through analyzing the time efficiency of MCAMDA, we find that DBSCAN... |