Font Size: a A A

Research, Genetic Algorithm-based Clustering Method

Posted on:2007-03-29Degree:MasterType:Thesis
Country:ChinaCandidate:Z H WuFull Text:PDF
GTID:2208360182997588Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The major reason that data mining has attracted a great deal of attention in the informationindustry in recent years is due to the wide availability of huge amounts of data and the imminentneed for turning such data into useful information and knowledge. People can apply the researchresult of knowledge discovery to the data process that can support the science decision. Clusteranalysis is a basic assignment of data mining and a kind of unsupervised learning. The goal ofclustering is to partition data set into such clusters that objects within a cluster have highsimilarity in comparison to one another, but are very dissimilar to objects in other clusterswithout any prior knowledge. By clustering, one can identity dense and sparse regions, therefore,discover overall distribution patterns and interesting correlations among data attributes.K-means algorithm is the most widespread in cluster analysis. However its vitalshortcoming is the sensibility to initial value, it is easy to run into a local optimum. Geneticalgorithm is a computational model of the human evolution, with implicit parallelism andcapacity of using effective global information. So using GA with K-means algorithm to solveclustering problem, we will get a hybrid algorithm which has good global and local searchcapability. It can solve clustering problem effectively. This paper presents a new hybrid geneticalgorithm to solve clustering problem.This paper analyses and studies genetic algorithm and classical clustering algorithms, andthen presents an improved multigroup parallel genetic algorithm based on simulated annealing,and uses this algorithm to solve clustering. The performance of the algorithm is tested. The mainwork includes:1. Introducing and analyzing clustering algorithms and genetic algorithm.This paper introduces the basic concept, tasks and correlative mature methods of datamining, and then introduces and analyses genetic algorithm and the basic concept and familiaralgorithms of cluster analysis.2. Presenting an improved multigroup parallel genetic algorithm based on simulatedannealing.The hybrid genetic algorithm that this paper presented combines the improved simulatedannealing genetic algorithm and multigroup parallel genetic algorithm, and then affiliates nichetechnique. This algorithm restrains premature convergence of simple genetic algorithmeffectively.3. The improved multigroup parallel genetic algorithm based on simulated annealing is usedto cluster analysis.K-means algorithm is consulted, and a kind of cluster-center-based floating point encodingmode is used. Meanwhile, because the parameter K is hardly confirmed before experiment, thispaper presents two clustering algorithms, one is invariable parameter K and the other is dynamicselecting proper parameter K. So algorithms have universal meaning.4. Testing the performance of the algorithms.For testing the performance of these clustering algorithms, this paper uses two groups ofdata to test the two clustering algorithms, and compares with the other clustering algorithms,such as K-means algorithm and clustering algorithm based on genetic algorithm. Experimentalresults demonstrate that the methods can solve clustering effectively.Because people face a mass of data, require clustering algorithm can solve actual problems welland truly. The algorithms of this paper can separate large assignment into small assignments bythe idea of multigroup parallel genetic algorithm, and then these small assignments can beperformed parallelly by different computers, so the efficiency of the algorithms are improved.Meantime, add in simulated annealing and niche technique in every populations, restraindisadvantages of simple genetic algorithm and can partition data set exactly by the localsearching capability of simulated annealing and niche technique which maintains the diversity ofindividuals in populations.
Keywords/Search Tags:Data Mining, Clustering, Genetic Algorithm, Simulated Annealing Genetic Algorithm, Niche, Multigroup
PDF Full Text Request
Related items