Study On Multiseed Clustering Algorithm Based On Max-min Distance Algorithm

Posted on:2007-11-22

Degree:Master

Type:Thesis

Country:China

Candidate:J Zhou

Full Text:PDF

GTID:2178360185974876

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

Data Mining, also called as knowledge discovery of databases (KDD), finds out connotative, unknown and potentially valuable knowledge and rules. As a rising crossover subject, data mining involves an integration of techniques from multiple disciplines such as machine learning, pattern recognition, database technology, statistics and artificial intelligence. Cluster analysis can be used not only as a separate technique to discover the information about data distribution, but also as the preprocessing of other data mining operations, therefore it is very meaningful to study how to boost the performance of clustering algorithm.The K-means clustering algorithm is a typical partition method, for it is easy to achieved, scalable and high efficient for disposing big data set. However, this algorithm requires the user to give the number of clusters beforehand. It is very sensitive to initial conditions, often gets trapped in local minimum and has only the best capability to capture clusters in hyperspherical shape.Mainly aiming at the dependency to initial conditions and limitation of K-means algorithm that applies the sum of the squared Euclidean distances between each data point xk and the centroid mj of the subset which contains xk as clustering criterion, a novel Multiseed Clustering Algorithm based on Max-min Distance Algorithm (MCAMDA) is proposed in this paper. Joining together with multiple sampling technique, MCAMDA applies max-min distance algorithm for the second time on the aggregation of candidate seeds to search optimal initial clustering seeds, which consequently avoids random of selecting initial seeds to a large extent. It is also able to minimize the knowledge of input parameter, that is, the number of clusters needn't to be supplied.Different from K-means type algorithm, MCAMDA assigns multiple seeds to a cluster, for an elongated or large cluster can be considered as the union of a few small distinct hyperspherical clusters. The big clusters are temporarily divided into some small ones which then will be combined into final ones by using the basic idea of DBSCAN. In order to test the clustering performance of MCAMDA, this paper presents several experiments which all show that the improved algorithm can lead to better and more stable solutions than K-means algorithm.Through analyzing the time efficiency of MCAMDA, we find that DBSCAN...

Keywords/Search Tags:

Data Mining, K-means, Max-min Distance, Sampling, Multiple Seeds

PDF Full Text Request

Related items

1	Study On The Class Merging Cluster Algorithm Based On FCM
2	The Research Of K-means Clustering Algorithm In Data Mining
3	Improvement And Application Of K-means Algorithm
4	Research On Multi-view Multiple Clustering Algorithm Based On Sampling And Inverse Optimization
5	The Research On Sampling For Data Mining
6	Research And Application Of K-means Algorithm Based On Density And Distance
7	Research And Application Of Data Mining In The Tax System
8	Research On Application Of Multi - Core Function FCM Algorithm In Data Mining Clustering
9	Research And Application On Sampling Of Tax Enforcement Inspection Based On Data Mining
10	Research On Cluster Analysis For Spatial Data Mining