Research Of Clustering Method In Data Mining Based On Genetic Algorithm

Posted on:2005-06-10

Degree:Master

Type:Thesis

Country:China

Candidate:S B Su

Full Text:PDF

GTID:2168360122492541

Subject:Computers and applications

Abstract/Summary:

PDF Full Text Request

Clustering analysis is one of most heated research topic of the day. Data clustering, a unsupervised classifying method, is the process of grouping together similar multi-dimensional data vectors into a number of clusters or bins. Clustering technique have been applied to a wide range of problems, including pattern cognition, data mining, decision-analyzing and prediction, etc., yet it is imperfect both theoretically and methodologically, even severe fault. Optimizing deeply clustering algorithms will not only help to perfect its theory, but also help to its popularization and application.This thesis aimed at studying following three aspects of clustering analysis from its theory, algorithms and applications in data mining.Firstly, classification of popular clustering algorithms is studied. Most existing clustering algorithms are classified and inter-compared from three different viewpoints, namely clustering criteria, cluster representation, and algorithm framework, and analysed and evaluated with hybrid methods, incremental algorithms, automation and visualization. It can make for existing algorithms to be improved by analysing their advantages and disadvantages, and for users to choose a right algorithm for a specified dataset in order to receive a optimization clustering results. It is also the basis of further classifying popular algorithm and establishment of clustering benchmark.Secondly, genetic algorithm(GA)-based clustering method is researched. Conventional clustering criteria-based algorithms is a kind of local search method by using iterative mountain climbing technique to find optimization solution, which has two severe defects-sensitive to initial data and easy as can get into local minimum. GA is a computational models of the human evolution, with implicit parallelism and capacity of using effectively global information. This thesis presented a modified genetic operators in clustering analysis, and firstly introduced good point set-based clustering algorithm-GAmeans, which characterized by inferior sensitivity to initial, robustness, and removable premature, and also firstly presented a hybrid method with GA and GAmeans. Experiment show that the hybrid method with general performances can find better clustering results.Finally, this thesis explored incremental algorithm, which featured normally in addable and non-iterative with some advantages, such as applicable to large and dynamic database, lower demand for memory, implementation of parallel processing and incremental update. This paper introduced an incremental grid density-based clustering algorithm-IGDCLUS, which can find high effectively arbitrary shape clusters, and is applicable in periodically incremental environment. However, existing algorithms is still sensitive to data order. Higheffective, self-adaptive, interactively dynamic, incremental clustering algorithm should be studied. Clustering technique in data mining will yet be faced with many problems and challenges.

Keywords/Search Tags:

data mining, genetic algorithm, clustering, incremental algorithm, good-point sets.

PDF Full Text Request

Related items

1	Research On K-means Clustering Algorithm Based On Semi-Supervised Good Point Set And Leader
2	Research On Clustering Algorithm Based On Genetic Algorithm And Rough Set Theory
3	Based On Rough Set Data Mining Method
4	Good Point Set Genetic Algorithm Theory And Application
5	Time Planning And Evolution Of Computing The Number Of Applied Research
6	Research On Dynamic Clustering And Incremental In Data Mining
7	FCM Clustering And Research Of Its Increment Algorithm
8	Research On Incremental Mining Algorithm And Application For Dynamic Databases
9	Research, Genetic Algorithm-based Clustering Method
10	Research On Incremental Clustering Algorithm In Data Mining