Font Size: a A A

Mixed Attributes And Hybrid Schemes Based Evolutionary Clustering Algorithm

Posted on:2011-11-22Degree:MasterType:Thesis
Country:ChinaCandidate:Z ZhengFull Text:PDF
GTID:2178330332488188Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Clustering is an important technology which has been studied wildly in many areas such as data mining, machine learning, image processing and so on. Evolutionary clustering algorithm is a kind of new and significant methods in clustering study. This thesis proposes a novel evolutionary clustering algorithm for mixed type data and two new hybrid schemes evolutionary clustering algorithms. First of all, the explanation of clustering problem and evolution computation is given. Then the proposed algorithms are discussed one by one with experiments and analysis.In the second chapter, a novel unsupervised evolutionary clustering algorithm for mixed type data is proposed. It is based on the K-prototype algorithm and applys evolutionary framework and operators to get reasonable partition of mixed type datasets. As a partitional clustering algorithm, K-prototype algorithm is a well-known one for mixed type data. However, it applies K-means paradigm, so it is sensitive to initialization and converges to local optimum easily. Global searching ability is one of the most important advantages of evolutionary algorithm (EA), so an EA framework is introduced to help KP overcome its flaws. Experiments on synthetic and UCI datasets show that EKP is more robust and can generate much better results than KP.Chapter three proposes a new hybrid evolutionary clustering algorithm with multi-population, which is a kind of hybrid schemes evolutionary clustering algorithm. It applies multi-population strategy and a new extraction procedure to transmit valuable information from parents in different candidate populations to offspring. Dataset is modeled as a graph and the graph-based KWNC criterion is used as the fitness function to select individuals. Reduction strategy is also aplied to speed up the whole algorithm. When evolution converges, different strategies for different evolutionary results are desinged to terminate the algorithm and get the final result. In experiments, artificial datasets and UCI datasets are used to test the proposed algorithm. The results show that the proposed algorithm is possible of finding better clustering results than the candidate clustering algorithms embedded in it and a traditional evolutionary K-means algorithm.In chapter four, another hybrid schemes evolutionary clustering algorithm is proposed, i.e., evolutionary clustering algorithm with multi-population and graph-based search. This algorithm applies the similar framework, fitness function, extraction scheme and termination as the one we proposed in chapter three. The diversity of hierarchical clustering candidate population is enhanced by using three different hierarchical clustering algorithms to generate it. Two different graph-based searching strategies are used to find more reasonable dataset partition, named neighborhood-based search and cluster linkage-based search. All of these designs enhance the performance of the algorithm. The experiments of artificial datasets and UCI datasets show that the algorithm proposed not only has the same advantages as the one we proposed in chapter three, but also performes much better compared with other algorithms. Moreover, we give out necessary analysis of some important parameters to explain the algorithm further.This work was supported by the National Natural Science Foundation of China (Grant Nos.60703107), the National High Technology Research and Development Program (863 Program) of China (Grant No.2009AA12Z210), the Program for New Century Excellent Talents in University (Grant No. NCET-08-0811), the Program for New Scientific and Technological Star of Shaanxi Province (Grant No.2010KJXX-03), and the Fundamental Research Funds for the Central Universities (Grant No. K50510020001).
Keywords/Search Tags:Clustering, Evolutionary Computation, Mixed Type Data, Hybrid Schemes, Multi-population, Graph-based Search
PDF Full Text Request
Related items