Font Size: a A A

Research Of K-means Clustering Method Based On Genetic Algorithm

Posted on:2008-07-27Degree:MasterType:Thesis
Country:ChinaCandidate:W JinFull Text:PDF
GTID:2178360212473589Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
The major reason that data mining has attracted a great deal of attention in the information industry in recent years is due to the wide availability of huge amounts of data and the imminent need for turning such data into useful information and knowledge. People can apply the research result of knowledge discovery to the data process that can support the science decision. Cluster analysis is a basic assignment of data mining and a kind of unsupervised learning. The goal of clustering is to partition data set into such clusters that objects within a cluster have high similarity in comparison to one another, but are very dissimilar to objects in other clusters without any prior knowledge. By clustering, one can identity dense and sparse regions, therefore, discover overall distribution patterns and interesting correlations among data attributes.K-means algorithm is the most widespread method in cluster analysis. However its vital shortcoming is the sensibility to initial value, it is easy to run into a local optimum. Genetic algorithm is a method of searching for best solution by imitating natural evolution,its notable features are implicit parallelism and capacity of using effective global information. So a k-means clustering methods based on genetic algorithm (GKA) is proposed .It has good global and local search capability,but its clustering speed is slower than k-means algorithm. In order to make the clustering speed faster, this paper puts forward an improved GKA algorithm.This algorithm is based on GKA,it makes some improvements on all the operates on the premise of allowing solutions with empty clusters and adds incremental operate,during which incremenatally calculate the cluster centers and the objective function.lt can make the algorithm clustering speed fester. Meanwhile, this paper designs a clustering analysis system. Through experiments using this system, it is proved that k-means clustering methods based on genetic algorithm is better than k-means algorithm. The improved GKA algorithm does clustering faster than former GKA algorithm and the advantage is more evident when a small mutation probability is input.This paper also puts forward that to use the improved GKA algorithm in the users-clustering of Web log mining system. It can avoid the influence of initiative values resulted in clustering result,and can obtain the overall best solution,can offer better individuail services to users ,improve and optimize Web sites.
Keywords/Search Tags:Data Mining, Clustering, K-means Algorithm, Genetic Algorithm, k-means clustering algorithm based on genetic algorithm
PDF Full Text Request
Related items