Application And Research Of Parallel Genetic Algorithm In Data Mining Of K-Medians

Posted on:2005-10-10

Degree:Master

Type:Thesis

Country:China

Candidate:B Liu

Full Text:PDF

GTID:2168360125463931

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

Data Mining have four kinds of important tasks: Clustering, Classfication, Association Rule, Sequence Pattern. Clustering is most important, efficiency and veracity are cared by people in Data Mining, To increase the efficiency and veracity in Data Mining, People use a lot of algorithms to solve its. these algorithms are Genetic Algorithm, Neural Networks, Fuzzy Theory, Rough Set theory and so on, In this paper,Applying Parallel Genetic Algorithm to K-medians algorithm is to enhance efficiency and veracity of K-mediansGA is an effective approach to solving combination optimization problems. It's a searching algorithm based on natural selection and evolution. Many simulation experiments indicate that GA can often generate satisfactory solution for middle or small-scale applications within permissive time. But to large or super-large scale tasks the simple serial GA takes less effect. An outstanding problem of simple GA that will hinder it from being applied is its pre-maturity in application process. Therefore, parallel technology and traditional GA are combined to improve GA's efficiency and reduce the pre-maturity by utilizing the inherent parallel characteristic of GA. K-medians is a clustering algorithm based on partition which is wildly used in current clustering analyzing. The shortcoming of K-medians is that it will easily get in local optimization, hence low efficiency. The amount K of median point is usually determined according to experience, thus it's not exact. Aiming at these deficiencies, applying GA to K-medians clustering data mining will necessarily enhance the efficiency and veracity of K-medians clustering. In order to apply GA to K-medians, corresponding coding scheme, fitness function and parallel computing model as well as relevant migration policy are put forward in this article. The experiment testifies that the efficiency and veracity are improved by using parallel GA to solve K-medians clustering problems.This thesis use PVM to organize a few PC together and Constitute the environment of parallel computing on Linux, model of Parallel Computing base on model of Master /Slave of coarse grain during computing, at first master machine send the individual to every slave machine, then slave machine begin computing, after slave machine finish computing at intervals, it will migrate individual to master machine by some migratory policy. at the same time, slave machine will carry back other slave machine's individual that is also send to master machine and continue computing. when halting condition is satisfied, slave machine will stop to compute. At last, the author try to analyse the data of experiment and compare to simple genetic algorithm and finish speedup of computing.

Keywords/Search Tags:

Parallel Genetic Algorithm, Clustering, K-Median, PVM, Parallel Computing

PDF Full Text Request

Related items

1	Mpi-based Parallel Genetic Algorithm Optimize Logistics Route
2	Parallel Clustering Algorithm Based On MapReduce
3	Study On Inverse Problems In Electromagnetics Based On Genetic Algorithm And Parallel Computing
4	Parallel Clustering Algorithm For Large-scale Biological Data Sets
5	Research And Development Of The Optimal Layout System For Two Dimensional Irregular Parts
6	Research On Parallel Non-Intervention Document Clustering Algorithm
7	Parallel Genetic Algorithm And Its Distributed Application On The Combination Optimum Problem
8	Mpi-based Parallel Genetic Algorithm 0-1 Knapsack Problem
9	Design Of S-boxes Optimum Alogrithms
10	Research On Parallel Hierarchical Clustering Algorithm