Font Size: a A A

Research Of Clustering Method In Data Mining Based On Genetic Algorithm

Posted on:2005-06-10Degree:MasterType:Thesis
Country:ChinaCandidate:S B SuFull Text:PDF
GTID:2168360122492541Subject:Computers and applications
Abstract/Summary:PDF Full Text Request
Clustering analysis is one of most heated research topic of the day. Data clustering, a unsupervised classifying method, is the process of grouping together similar multi-dimensional data vectors into a number of clusters or bins. Clustering technique have been applied to a wide range of problems, including pattern cognition, data mining, decision-analyzing and prediction, etc., yet it is imperfect both theoretically and methodologically, even severe fault. Optimizing deeply clustering algorithms will not only help to perfect its theory, but also help to its popularization and application.This thesis aimed at studying following three aspects of clustering analysis from its theory, algorithms and applications in data mining.Firstly, classification of popular clustering algorithms is studied. Most existing clustering algorithms are classified and inter-compared from three different viewpoints, namely clustering criteria, cluster representation, and algorithm framework, and analysed and evaluated with hybrid methods, incremental algorithms, automation and visualization. It can make for existing algorithms to be improved by analysing their advantages and disadvantages, and for users to choose a right algorithm for a specified dataset in order to receive a optimization clustering results. It is also the basis of further classifying popular algorithm and establishment of clustering benchmark.Secondly, genetic algorithm(GA)-based clustering method is researched. Conventional clustering criteria-based algorithms is a kind of local search method by using iterative mountain climbing technique to find optimization solution, which has two severe defects-sensitive to initial data and easy as can get into local minimum. GA is a computational models of the human evolution, with implicit parallelism and capacity of using effectively global information. This thesis presented a modified genetic operators in clustering analysis, and firstly introduced good point set-based clustering algorithm-GAmeans, which characterized by inferior sensitivity to initial, robustness, and removable premature, and also firstly presented a hybrid method with GA and GAmeans. Experiment show that the hybrid method with general performances can find better clustering results.Finally, this thesis explored incremental algorithm, which featured normally in addable and non-iterative with some advantages, such as applicable to large and dynamic database, lower demand for memory, implementation of parallel processing and incremental update. This paper introduced an incremental grid density-based clustering algorithm-IGDCLUS, which can find high effectively arbitrary shape clusters, and is applicable in periodically incremental environment. However, existing algorithms is still sensitive to data order. Higheffective, self-adaptive, interactively dynamic, incremental clustering algorithm should be studied. Clustering technique in data mining will yet be faced with many problems and challenges.
Keywords/Search Tags:data mining, genetic algorithm, clustering, incremental algorithm, good-point sets.
PDF Full Text Request
Related items