With the development of the social production and the pace of life, the information technology advances quickly, which leads to the amount of the data information increase exponentially. Data mining technology has powerful data analysis and processing capabilities, provides important and valuable information or knowledge to make decision, and brings immeasurable value to society. Therefore, the study of data mining methods has important theoretical and practical significance.Clustering analysis is one of the main tasks of data mining. K-means algorithm is the most commonly used clustering method. K-means algorithm possesses strong local search capability and rapid convergence capability, and the clustering result is independent of the order of the sample data. However, the algorithm is very sensitive to the initial cluster centers, and easily falls into local minimum. Genetic algorithm has strong ability of global query optimization and is widely used to solve various optimization problems. In computation, we needs to determine the objective function and fitness function, and gradient information and other auxiliary knowledge are not necessary. Therefore, combining genetic algorithm and k-means algorithm not only have strong capability of global query optimization of the genetic algorithm but also maintains strong local search capability of k-means algorithm.This dissertation mainly focuses on making a better combination of genetic algorithm and k-means algorithm to complement each other and improve the efficiency of clustering algorithm. Specially, to the clustering problem the standard genetic algorithm is improved. The main details are as following: Firstly, on the basis of remaining the search space unchanged after maintaining crossover and mutation and applying floating-point number coding method to the genetic algorithm, the chromosome encoding length reduced; Secondly, using arithmetic crossover operator and uniform mutation operator of the shortest distance gene matching obtains a meaningful new chromosomes; Thirdly, parent population competing strategy is employed to replace the usual optimal preservation strategy , and then to improve the algorithm convergence speed. Finally, combing the used method with two kinds of stopping criterion, the running time of the algorithm is effectively shortened by controlling the computing process. The two stopping criterions: Genetic algorithm stops when the evolution algebra of populations reaches the specified termination algebra; Genetic algorithm stops when the difference between the average fitness values of the population individuals of the successive iterations is less than a minimum threshold. If one of the two kinds of criteria is satisfied, the genetic algorithm stops.This dissertation presents an improved genetic k-means algorithm (IGK), which is used to make a combination of the improved genetic algorithm and the k-means algorithm. An improved genetic algorithm is utilized to optimize the initial cluster centers, and then the k-means algorithm is carried. Test results show that IGK algorithm can avoid clustering algorithm to reach a local minimum. Further, the algorithm's stability is high and convergence speed is fast. |