Font Size: a A A

The Research Of K-means Clustering Algorithm Improvement Based On Genetic Algorithm

Posted on:2007-07-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q ZhangFull Text:PDF
GTID:2178360185474364Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Clustering is an important area of application for a variety of fields including data mining and is an important method of data partition or grouping. Clustering has been used in various ways in commerce, market analysis, biology, Web classification and so on. So far, clustering algorithms includes partitioning, hierarchical, density-based, grid-based ,model-based algorithm and fuzzy clustering.K-means algorithm is one of the major clustering algorithm.It is a kind of clustering algorithm based on partitioning method.This algorithm chooses k(the number of clusters) points as the initial cluster centers,achieve the process of clustering through iteration.This algorithm is correspondingly telescopic and effective,but it has its own inherent deficiency:the process is iterative,and can't be sure to converge at the optimum value,it always coverge at the partial optimum value instead of global optimum value. The number of k needs to be known before clustering,the task is so difficult for the unexperienced users.and through the experiment we know that the choice of initial cluster centers is also very important for the clustering.Genetic algorithm based on the conception of biological evolution designs a series of process to optimize the solution.These processes include gene combination,crossover,variation,natural selection.In these procedures,eliminate the bad gene through the principle of"Survival of the fittest"and develop the solution to better direction. Genetic algorithm begins with a group of initial feasible solutions, achieves the global effective search of the feasible field with the only information object function and converges to the global optimum value with the probability 1,this kind of nicer characteristic make the genetic algorithm a useful tool for combination and function optimization.The genetic algorithm becomes the research hotspot in the field of computational intelligence.In this text,we first do some research on the genetic algorithm about clustering, discuss about the way of coding and the construction of fitness function,analyze the influence that different genetic manipulation do to the effect of cluster algorithm.Then analyze and research on the way that select the initial value in the k-means algorithm, we propose a mix clustering algorithm to improve the k-means algorithm by using genetic algorithm.First we use k-learning genetic algorithm to identify the number of the clusters ,then use the clustering result of the genetic clustering algorithm as the initial cluster center of k-means clustering.These two steps are finished based on small database which equably sampling from the whole database,now we have known the number of the clusters and initial cluster center,finally we use k-means algorithm to finish the clustering on the whole database.because genetic algorithm search for the best solution by simulating the process of evolution,the most distinct trait of the algorithm is connotative parallelism and the ability to take advantage of the global information,so the algorithm take on strong steadiness,avoid getting into the local...
Keywords/Search Tags:Clustering, k-means algorithm, Genetic algorithm
PDF Full Text Request
Related items