Font Size: a A A

Improved K-means Clustering Based On Genetic Algorithm

Posted on:2012-02-03Degree:MasterType:Thesis
Country:ChinaCandidate:M WangFull Text:PDF
GTID:2178330335478016Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of technology, data acquisition and storage technology have been greatly improved, and a large amount of data in all areas has been accumulated. But the ability to analyze the data and acquire knowledge and discipline far below people growing requirements on potential information about the data, so the subject of data mining came into being. Data mining make people have the ability to understand the underlying true value of the data, so far it is one of the most cutting-edge research in the database and information decision-making areas. Cluster analysis is an important research direction in data mining, through clustering can identify the global distribution pattern and the potential relationship between data and attributes. K-means algorithm is a simple division algorithm in the clustering algorithm, which has many features, simple, fast convergence, and can effectively handle large data sets. However, there are some deficiencies, the value of K can not be determined, clustering results is sensitive to the initial cluster centers, and it is largely affected by the outliers etc.The article introduces the K-means algorithm, and pulls in genetic algorithm to overcome its disadvantages. The paper describes the genetic algorithm in detailed, analyzes all the genetic operations and genetic parameters'impact on genetic algorithm, designs improved K-means clustering algorithm based on genetic algorithm, solutions to the initial centers'sensitive issues and improves the global search capability, and reduces the impact of isolated points. First, use genetic algorithm to globally search the initial cluster centers for the best initial cluster centers, and run the improved K-means algorithm to find the best cluster centers by the local search ability. Second, when updating the cluster centers in the clustering iterative process, the paper do not take the mean of all objects in class but the mean of part of the subset which is closer to the center as the next generation of cluster centers to resolve isolated points. Finally, test new algorithm with the standard data, compare the experimental results with the results from traditional K-means algorithm and other improved algorithms to prove the effectiveness of the proposed algorithm.
Keywords/Search Tags:clustering algorithms, K-means algorithm, initial cluster centers, isolated points, genetic algorithm
PDF Full Text Request
Related items