With the rapid development of information technology and database technology, the data of people’s daily shows explosive growth. Data mining, as a technology of extracting useful information from large data sets, helps people make scientific decisions based on the data. Cluster analysis is a fundamental analysis of data mining, which is an unsupervised classification method. Through cluster analysis, we can divide the large amounts of data into different clusters without any prior knowledge, and make the inter-cluster objections be very similar; meanwhile, the outer-cluster object is not similar. Then we can discover interesting patterns among the large amounts of data.In data mining, clustering analysis is a common method, while K-means algorithm is the most popular clustering algorithm based on division. The disadvantage of this algorithm is that it is easily influenced by the initial cluster centers, and will have a premature convergence to local optimal solutions. To solve this problem, we propose a K-means clustering algorithm based on adaptive genetic algorithm and a K-means clustering algorithm based on DNA genetic algorithm by using the advantages of global optimization of genetic algorithm and verified the validity of the algorithm through the sample data. Meanwhile, we have applied the improved algorithm into customer behavior segmentation of the china mobile company, and achieved good results.The main work is as follows:1) Combining the advantages of both genetic algorithm and K-means clustering algorithm, this paper has designed an K-means clustering algorithm based on adaptive genetic algorithm. The improved algorithm gets the optimal initial center by using the ability of global optimization of genetic algorithm; and then, it uses K-means clustering algorithm to cluster and gets optimal clustering results.2) For the problem of K-means clustering algorithm influenced by the initial cluster centers, this paper has proposed K-means clustering algorithm based on DNA genetic algorithm. It adopts DNA coding. Besides, two crossover operators are designed.The diversity of population can impoved, which avoids the premature convergence effectively. Meanwhile, this paper has proposed a new multi-step evolution strategy. It enhances the global search capability of the algorithm. Finally, Rosenbrock test function is used to verify the validity of the algorithm, and then use the sample data to verify the accuracy of clustering results. |