Study On Data Partition DBSCAN Using Genetic Algorithm

Posted on:2006-04-09

Degree:Master

Type:Thesis

Country:China

Candidate:S Sun

Full Text:PDF

GTID:2168360155972890

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

Data Ming finds out connotative, unknown and potentially valuable knowledge and rules. Clustering is one of the important research fields in data mining. Clustering is the process of grouping physical or abstract sets into several similar clusters. The clusters produced by clustering are sets of data objects. One object is similar to the other objects in the same cluster, and is different from the objects in different clusters. In many applications, the objects in the same cluster can be treated as a whole. When analyzing a big, complicated, continuous data base or totally unknown structures, clustering is a very useful tool. At present, clustering analysis algorithm can be sorted into several kinds: partition method, hierarchy method, density based method, gridding based method and model based method. DBSCAN algorithm is a typical density based method. The merits of DBSCAN are that it can finds out arbitrary shape clusters, and its clustering result is hardly influenced by noise points. The short points of DBSCAN are listed as follows: when the amount of data is big, the memory requirements is high; global variables, Eps and MinPts, are used in the algorithm, if the value of these tow variables do not suit, clustering result could be influenced; when the data distribution is uneven, adopting the global variables can debase the clustering quality. Aimed at the disadvantages of DBSCAN, Data Partition DBSCAN using Genetic Algorithm (DPDGA) is proposed in this paper. DPDGA adopts the Genetic Algorithm based method to find out the cluster centers. This method adopts the basic idea of K-means algorithm, but it uses Genetic Algorithm, not common iteration, to optimize gradually. The advantage of the Genetic Algorithm based method is that it do not need transcendental knowledge about the data set to be clustered. Experiment shows that the cluster centers, get by Genetic Algorithm based cluster center getting method, are close to the real cluster centers. After getting cluster centers by Genetic Algorithm based method, DPDGA partition the data set according to the initial cluster centers. The value of MinPts of every local data set is computed, then DBSCAN is used in every local cluster to get local clustering result. At last, all local clustering results are merged to get the clustering result of the whole data set. Because of the partition of data set, DPDGA reduces the requirement of memory. In DPDGA, the method of computing variable value is proposed. To the uneven data set, because of adopting different variable values in each local data set, the dependence on global variable is reduced, and the clustering results are better.

Keywords/Search Tags:

Data-Ming, Clustering, DBSCAN, K-means, Genetic Algorithm, Cluster Center

PDF Full Text Request

Related items

1	Optimized K-Means Clustering Analysis Based On Genetic Algorithm
2	A Research Of Developed Algorithms About Text Cluster Center Choose
3	Improved K-means Clustering Based On Genetic Algorithm
4	Parallel K-means Clustering Method And Its Resume Data Applied Research
5	Research And Implementation Of DBSCAN Algorithm Based On Spatial Clustering
6	Data Mining, Cluster Analysis Algorithm Research And Application
7	Research Of K-Means Clustering In Data Mining Based On Genetic Algorithm
8	The Study Of Application And Analysis About Clustering Algorithm In Data Mining
9	The Research Of The K-means Clustering Algorithm Based On Nearest Neighbors
10	Research On Problems Related To The Initial Center Selection In K-means Clustering Algorithm