Data mining, as an emerging technology which can excavate valuable information behind the complex relationship, has rapid development because of its practicality. Clustering analysis is an important research field of data mining, and has been widely used in industries as a kind of unsupervised learning methods. K-means clustering algorithm, which is a typical clustering method, is simple, but there are some shortcomings, such as being sensitive to the initial centers, necessary to determine the number of clusters in advance and easy to fall into local optimum.Genetic algoritlm(GA) can provide a model which can find the best answer, which is used into K-means clustering algorithm, and become the K-means clustering algorithm based on GA. K-means clustering algorithm based on GA has developed in the chromosome encoded mode , inheriting the operator and controls parameter , and try to gain higher clustering impact . Recently, K-means clustering algorithm based on GA is used to choose optimun focusing, or how to gaining optimum clustering number and so on.After summarized previous research results, this paper has proposed a new K-means clustering genetic algorithm to achieve a K value of the automatic adjustment and optimization of the center. Due to the global optimization ability of genetic algorithms, this method can overcome the K-means clustering algorithm's local optimal shortcomings.Genetic algorithm is introduced in the K-means clustering algorithm, the genetic operators of genetic algorithm have been improved to adjust the K value and optimize the center, two specific improvements is shown as follows:First, the construction of fitness functionWhether the construction of fitness function is good or not will affect the subsequent operation of a series of genetic operators. Therefore, in this paper the fitness function value of the center in addition to the optimization work of a certain impact, the traditional K-means clustering algorithm first need to determine the value of the size of the cluster number K, and K values determine to a large extent dependent on experience, the genetic algorithm is introduced to the K mean algorithm, based on the fitness function by means of the use of populations and the fitness value of each individual to search, learn the best number of clusters K values, so the fitness function Selection and Determination of K value will determine the selection and optimization.Second, the design of mutationBy solving the individual fitness function, the mutation operation to achieve K values for the number of clusters is automatically adjusted so that it can automatically be closer to the optimal number of clusters. Finally, the java programming language and Mysql database are used to simulate our proposed algorithm. In addition to the experimental data using commonly used Iris data set and the glass data set, a large number of data records from telecommunications services in the long-distance telephone are analyzed. After verification of these data and results analysis, it is proved that the algorithm can deal with different dimensions of data, with scalability, but also in the actual application of the telecommunications business also have practical values. |