Font Size: a A A

Research Of K-Means Clustering In Data Mining Based On Genetic Algorithm

Posted on:2010-07-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y L ZhaoFull Text:PDF
GTID:2178360275462180Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
Data mining is a new subject formed with the development of the information technology and is a new research point in the information and database technology. The purpose of data mining is to discovery hidden and useful knowledge which can support the science decision from huge amounts of data.Cluster analysis is one of the important themes in data mining. Clustering is an unsupervised classifying method, the goal of clustering is to partition data set into such clusters that objects within a cluster have high similarity in comparison to one another, but are very dissimilar to objects in other clusters without any prior knowledge. As a classical method of clustering analysis, k-means has been widely used in commerce, market analysis, biology, text classification and so on. However k-means has two severe defects—sensitive to initial data and easy to get into a local optimum. On this condition, combining the idea of genetic algorithm, a hybrid algorithm of clustering which is based on genetic algorithm and k-means algorithm is proposed, the performance of the hybrid algorithm is tested.The main research work of the paper includes:Firstly, clustering analysis technology is introduced in details, most existing clustering algorithms are classified, and their advantages and disadvantages are analyzed. On this basis, k-means method is choosen as research target.Secondly, an important method—genetic algorithm in data mining is introduced, and the characteristic, basic element, applied flow of it are described in details.Thirdly, based on the characteristics of genetic algorithm and k-means method, a new clustering method of k-means based on improved genetic algorithm is proposed. The proposed algorithm is described in details from coding method, fitness function, selection operators, crossover operators, mutation operators, k-means operators and other aspects.Finally, for testing the performance of the proposed algorithm, the paper gives three simulation experiments. Simulation results show that comparing with k-means method, the proposed algorithm can get a better clustering result.
Keywords/Search Tags:Data mining, Cluster analysis, Genetic algorithm, k-means, IGKA
PDF Full Text Request
Related items