Improvement Of The K-means Algorithm

Posted on:2013-03-25

Degree:Master

Type:Thesis

Country:China

Candidate:Y Q Tang

Full Text:PDF

GTID:2268330392462642

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Now with the development of computer network and database technology, the informationgrows rapidly. How to find useful information has become a very important subject, and datamining emerged in this context. Data mining is to discover useful information or knowledgefrom a large number of data to help to make scientific decision. Cluster analysis is an importantanalytical tool in data mining and is an unsupervised classification method as well. The purposeof clustering analysis is to divide a data set into several clusters, making the data in the samecluster as similar as possible and the data in different clusters as different as possible. There aremany algorithms on cluster analysis, among which the K-means clustering is the most widelyused and most popular algorithm, with its simple idea and fast convergence. However, theK-means algorithm has many shortcomings: the clustering result is sensitive to the initialcenters; the value of k must be given before clustering. To solve the above problems, wepresent an improved K-means algorithm and a k-value learning algorithm in this paper.Improved K-means clustering algorithm is to solve the problem that the clustering results ofthe traditional K-means algorithm is sensitive to initial cluster centers and to improve theperformance of the algorithm at the same time. The improved algorithm is mainly used when thenumber of clusters is known, which can be run with no parameter but the value of k, inputted byuser. In the algorithm, first, we calculate the distance from origin to each data point, and thensort the data point’s according to their distances. At last, divide ding the sorted data points into kequal sets. In each set, we take the middle points as the initial centers. At last, we provide anefficient way of assigning the data points to suitable clusters with reduced time complexity.K-value learning algorithm is to solve the problem that the K-means algorithm cannotdetermine the value of k, which used genetic algorithm. Genetic algorithm is a kind of randomsearch method in which the process of biological evolution is simulated. According to thefitness function and through natural selection, cross restructuring, variation and other geneticoperations, it constantly updates the population to search the optimal solution. In the algorithm,we used binary coding, make each individual represents a k value. According to the fitnessfunction and constantly through genetic operation iteration we can find the optimal clusternumber. In the genetic algorithm, we used the adaptive crossover probability and mutationprobability to speed up the convergence of the algorithm. Finally, we do two experiments to test the effectiveness of improved K-means algorithmand k-value learning algorithm. In the first experiment, we run both the traditional K-meansalgorithm and improved K-means algorithm on five groups of data, and compare the clusteringresults and the time each algorithm cost. In the second one, we run k-value learning algorithmon above five groups of data. The experiment shows that the improved K-means algorithm canget better clustering results with less time, and k-value learning algorithm can find the bestcluster number at a high probability.

Keywords/Search Tags:

cluster analysis, K-means algorithm, genetic algorithm, improved K-means algorithm

PDF Full Text Request

Related items

1	Optimized K-Means Clustering Analysis Based On Genetic Algorithm
2	Application Of An Improved Genetic Algorithm In K-Means Cluster Analysis
3	Improved K-means Clustering Based On Genetic Algorithm
4	Research On Parallel K-means Algorithm Based On Genetic Algorithm
5	Application Of Improved K-MEANS Algorithm In Customer Segmentation
6	Research And Application Of Improved K-means Algorithm In Multivariate Analysis System
7	Application Research Of Improved K-means Algorithm In Big Data Clustering
8	Clustering Analysis Of K-means Based On Improved Genetic Algorithm
9	K-means Cluster Algorithm Based On Improved PSO And Its Application In Recommendation System
10	Research On K-Means Algorithm And Its Integration With Intelligent Algorithms