Research Of K-means Clustering Method Based On Genetic Algorithm

Posted on:2008-07-27

Degree:Master

Type:Thesis

Country:China

Candidate:W Jin

Full Text:PDF

GTID:2178360212473589

Subject:Communication and Information System

Abstract/Summary:

PDF Full Text Request

The major reason that data mining has attracted a great deal of attention in the information industry in recent years is due to the wide availability of huge amounts of data and the imminent need for turning such data into useful information and knowledge. People can apply the research result of knowledge discovery to the data process that can support the science decision. Cluster analysis is a basic assignment of data mining and a kind of unsupervised learning. The goal of clustering is to partition data set into such clusters that objects within a cluster have high similarity in comparison to one another, but are very dissimilar to objects in other clusters without any prior knowledge. By clustering, one can identity dense and sparse regions, therefore, discover overall distribution patterns and interesting correlations among data attributes.K-means algorithm is the most widespread method in cluster analysis. However its vital shortcoming is the sensibility to initial value, it is easy to run into a local optimum. Genetic algorithm is a method of searching for best solution by imitating natural evolution,its notable features are implicit parallelism and capacity of using effective global information. So a k-means clustering methods based on genetic algorithm (GKA) is proposed .It has good global and local search capability,but its clustering speed is slower than k-means algorithm. In order to make the clustering speed faster, this paper puts forward an improved GKA algorithm.This algorithm is based on GKA,it makes some improvements on all the operates on the premise of allowing solutions with empty clusters and adds incremental operate,during which incremenatally calculate the cluster centers and the objective function.lt can make the algorithm clustering speed fester. Meanwhile, this paper designs a clustering analysis system. Through experiments using this system, it is proved that k-means clustering methods based on genetic algorithm is better than k-means algorithm. The improved GKA algorithm does clustering faster than former GKA algorithm and the advantage is more evident when a small mutation probability is input.This paper also puts forward that to use the improved GKA algorithm in the users-clustering of Web log mining system. It can avoid the influence of initiative values resulted in clustering result,and can obtain the overall best solution,can offer better individuail services to users ,improve and optimize Web sites.

Keywords/Search Tags:

Data Mining, Clustering, K-means Algorithm, Genetic Algorithm, k-means clustering algorithm based on genetic algorithm

PDF Full Text Request

Related items

1	Research Of K-means Clustering Method Based On Genetic Algorithm
2	Clustering Analysis Of K-means Based On Improved Genetic Algorithm
3	Study Of K-Means Clustering Based On Genetic Algorithm
4	Study Of K-means Clustering Based On Genetic Algorithm
5	Analysis And Research Of K-means Algorithm Based On Genetic Algorithm
6	Research And Application Of K-means Algorithm In Data Mining Technology Based On Genetic Algorithm
7	Improved K-means Clustering Based On Genetic Algorithm
8	Research Of K-Means Clustering In Data Mining Based On Genetic Algorithm
9	Research On Clustering Algorithm Based On Genetic Algorithm And Rough Set Theory
10	Based On The Context Of Genetic K-means Clustering Algorithm Model Of Quantitative Research