Improved K-means Clustering Based On Genetic Algorithm

Posted on:2012-02-03

Degree:Master

Type:Thesis

Country:China

Candidate:M Wang

Full Text:PDF

GTID:2178330335478016

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the development of technology, data acquisition and storage technology have been greatly improved, and a large amount of data in all areas has been accumulated. But the ability to analyze the data and acquire knowledge and discipline far below people growing requirements on potential information about the data, so the subject of data mining came into being. Data mining make people have the ability to understand the underlying true value of the data, so far it is one of the most cutting-edge research in the database and information decision-making areas. Cluster analysis is an important research direction in data mining, through clustering can identify the global distribution pattern and the potential relationship between data and attributes. K-means algorithm is a simple division algorithm in the clustering algorithm, which has many features, simple, fast convergence, and can effectively handle large data sets. However, there are some deficiencies, the value of K can not be determined, clustering results is sensitive to the initial cluster centers, and it is largely affected by the outliers etc.The article introduces the K-means algorithm, and pulls in genetic algorithm to overcome its disadvantages. The paper describes the genetic algorithm in detailed, analyzes all the genetic operations and genetic parameters'impact on genetic algorithm, designs improved K-means clustering algorithm based on genetic algorithm, solutions to the initial centers'sensitive issues and improves the global search capability, and reduces the impact of isolated points. First, use genetic algorithm to globally search the initial cluster centers for the best initial cluster centers, and run the improved K-means algorithm to find the best cluster centers by the local search ability. Second, when updating the cluster centers in the clustering iterative process, the paper do not take the mean of all objects in class but the mean of part of the subset which is closer to the center as the next generation of cluster centers to resolve isolated points. Finally, test new algorithm with the standard data, compare the experimental results with the results from traditional K-means algorithm and other improved algorithms to prove the effectiveness of the proposed algorithm.

Keywords/Search Tags:

clustering algorithms, K-means algorithm, initial cluster centers, isolated points, genetic algorithm

PDF Full Text Request

Related items

1	Research On The Selection Of Initial Cluster Centers In K-means Algorithm
2	Improved K-means Algorithm Based On Optimizing Initial Cluster Centers
3	Study On Problems To Select Initial Cluster Centers Of The K-means Algorithm
4	Research And Application Of K-means Clustering Algorithm
5	Improvements And Implementation Of K-means Clustering Algorithm
6	Research On Initial Cluster Centers Choice Algorithm And Clustering For Imbalanced Data
7	A Genetic Algorithm that Exchanges Neighboring Centers for Fuzzy c-Means Clustering
8	Research And Application Of Fuzzy Clustering Algorithm
9	Research On Improvement Of K-means Clustering Algorithm
10	K-means Algorithm For Optimizing Initial Clustering Centers Based On Improved Density Peak