Data mining is a new discipline arising in the rapid development of information technology, and clustering is a very important branch of data mining field .Although various cluster algorithms suitable for data mining have been put forward, there exist some problems with them, such as computing efficiency, selection of initial values, optimized solution and so on. Therefore, cluster algorithm needs deeper research to meet engineering demanding. It has great importance to both theory study and application of the algorithm. And consequently, three improved methodologies of K-means method are proposed in this thesis and their high performance compared to the basic K-means method is demonstrated by some examples. The major achievements of this thesis are as follows.Firstly, the thesis introduces some basic characters and definition of data mining, and then some main technologies used in data mining, some basic concepts of clustering and some main clustering algorithms are analyzed. Furthermore, the thesis r focuses on introducing K-means algorithm which is classic in cluster algorithm. The advantages of the method are analyzed as well as the disadvantages. Though the basic K-means algorithm has been widely used in some aspects, however, there exist some limitations especially for large-scale data mining problem.Secondly, the thesis utilizes complex method to optimize K-means algorithm. Complex method is a heuristic search algorithm used widely for its efficient and stable result and simple procedure. The result of demonstrated example indicates that the improved of K-means algorithm has good robustness and high computational efficiency. Therefore, the algorithm can satisfy more need of engineering.Thirdly, based on the characteristics of complex method and genetic algorithm, the complex method can be treated as an operator of genetic algorithm to improve the local search ability of genetic algorithm. So complex and genetic algorithm are integrated as complex-GA. The simulation result of the given example proves that complex-GA not only keeps the global solution of genetic algorithm but also has the strong local search capability of complex, avoiding the premature feature of genetic algorithm and improving computing efficiency at the same time.Fourthly, the optimizing algorithm combining genetic algorithm and complex method is used to optimize cluster of K-means. In combining optimizing algorithm process, GA is used to find the optimizer region first and complex method which has strong local search capability is then used to search the precise optimal solution based on result of GA. The new method overcomes the disadvantages of the two algorithms... |