Font Size: a A A

Study Of K-Means Clustering Based On Genetic Algorithm

Posted on:2011-06-17Degree:MasterType:Thesis
Country:ChinaCandidate:X T WuFull Text:PDF
GTID:2178330305460321Subject:Information Science
Abstract/Summary:PDF Full Text Request
Data mining is a new interdisciplinary subject with the development of the information technology and is a cutting-edge research topic in the information and database technology fields.Clustering analysis is one of the important research fields in the data mining. Clustering is an unsupervised classifying method, the target of clustering is to partition data into different clusters that data within a cluster have high similarity and different clusters have the lowest similarity. As a classical method of clustering analysis, K-means has been widely used in market analysis, biology, commerce, text classification and so on. K-means clustering algorithm has strong local search ability, but the selection of the initial clustering centers is sensitive to the clustering results, is easy to fall into local optimal.However, the genetic algorithm is an efficient global search method, and its local search ability is poor. This paper will combine the advantage of K-means clustering algorithm and genetic algorithm, and propose a K-means clustering based on algorithm genetic algorithm (KBGA), and verified the validity of the algorithm by experiment.The main research works of the paper include:Firstly, K-means clustering analysis technology is introduced in details, its advantages and disadvantages are analyzed. On this basis, the solution method is chosen.Secondly, an important method-genetic algorithm in data mining is introduced including the characteristic,basic element, the problem, flow of it are described in details.Thirdly, this paper proposes a new improved K-means clustering method of based on genetic algorithm on the base of the characteristics of genetic algorithm (KBGA).Because the traditional K-means algorithm is very sensitive to the initial cluster centers, so this paper proposes a minimum maximum principle based on similarity to select the initial clustering centers, and proposes a corresponding improved K-means clustering algorithm (IKA). In addition, as K is an important parameter to affect clustering. In order to obtain high-precision clustering results, based on the characteristics of genetic algorithm and K-means algorithm, this paper presents a method to select K based on genetic algorithm, The proposed algorithm is described in details from coding method, fitness function, selection operator, crossover operator, mutation operator, and so on.Finally, for testing the performance of the proposed algorithm, the paper gives experiments. Results show that the proposed algorithm can get a better clustering result.
Keywords/Search Tags:data mining, clustering analysis, genetic algorithm, K-means
PDF Full Text Request
Related items