Data Ming finds out connotative, unknown and potentially valuable knowledge and rules. Clustering is one of the important research fields in data mining. Clustering is the process of grouping physical or abstract sets into several similar clusters. The clusters produced by clustering are sets of data objects. One object is similar to the other objects in the same cluster, and is different from the objects in different clusters. In many applications, the objects in the same cluster can be treated as a whole. When analyzing a big, complicated, continuous data base or totally unknown structures, clustering is a very useful tool. At present, clustering analysis algorithm can be sorted into several kinds: partition method, hierarchy method, density based method, gridding based method and model based method. DBSCAN algorithm is a typical density based method. The merits of DBSCAN are that it can finds out arbitrary shape clusters, and its clustering result is hardly influenced by noise points. The short points of DBSCAN are listed as follows: when the amount of data is big, the memory requirements is high; global variables, Eps and MinPts, are used in the algorithm, if the value of these tow variables do not suit, clustering result could be influenced; when the data distribution is uneven, adopting the global variables can debase the clustering quality. Aimed at the disadvantages of DBSCAN, Data Partition DBSCAN using Genetic Algorithm (DPDGA) is proposed in this paper. DPDGA adopts the Genetic Algorithm based method to find out the cluster centers. This method adopts the basic idea of K-means algorithm, but it uses Genetic Algorithm, not common iteration, to optimize gradually. The advantage of the Genetic Algorithm based method is that it do not need transcendental knowledge about the data set to be clustered. Experiment shows that the cluster centers, get by Genetic Algorithm based cluster center getting method, are close to the real cluster centers. After getting cluster centers by Genetic Algorithm based method, DPDGA partition the data set according to the initial cluster centers. The value of MinPts of every local data set is computed, then DBSCAN is used in every local cluster to get local clustering result. At last, all local clustering results are merged to get the clustering result of the whole data set. Because of the partition of data set, DPDGA reduces the requirement of memory. In DPDGA, the method of computing variable value is proposed. To the uneven data set, because of adopting different variable values in each local data set, the dependence on global variable is reduced, and the clustering results are better. |