Font Size: a A A

Clustering Ensemble Algorithm Based On Computational Intelligence

Posted on:2007-01-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y YangFull Text:PDF
GTID:1118360212459916Subject:Traffic Information Engineering & Control
Abstract/Summary:PDF Full Text Request
With the explosive growth of the Internet, the World Wide Web has become an important tool to obtain information. How to help users to discover the potential and useful knowledge in such a wide distribution of the web, which contains a huge amount of dynamic and semi structured data, is becoming research hotspot in information secience and technology fields. Web data mining technology is a new research domain to solve the problem, and involves association analysis, classification analysis, clustering analysis, feature analysis, pattern sequence analysis, trend analysis and so on. As an effective tool of data mining, clustering analysis is attracting broad interest, and has led to a new breed of various approaches recently.As one of the most vitality research aspects of information science, computational intelligence is attracting intensive attention. Computational intelligence can be viewed as inspired from natural biological system and human beings intelligence, and its aim is to simulate and re-emerge some intelligent actions of human beings using computer. Fields concerned with computational intelligence include artificial neural network, fuzzy logic and evolutionary computation. Many successful applications have been reported from varied sectors such as medical diagnostics, image processing, pattern recognition, computational biology, finance analysis, and Web analysis.In order to improve the clustering performance, this dissertation investigates the clustering ensemble algorithms based on computational intelligence systematically, and presents two novel algorithms: one is the multi-ant colonies ensemble algorithm for clustering, and another is the ART-based clustering ensemble algorithm. By analyzing different approaches of clustering performance evaluation, an ant-based clustering algorithm using validity index is proposed that uses a clustering validity index not only to evaluate the performance of the algorithm, but also to find the optimal number of clusters and reduce outliers. The experimental results show that the proposed new ideas and novel methods on clustering ensemble are both effective and efficient and suit for document clustering in some sense.In summary, the main research and innovation fruits achieved in this dissertation are given as follows:(1) Improving the traditional ant-based clustering algorithm.In the ant-based clusterings algorithm, data objects are randomly projected onto a plane firstly. Then each ant chooses an object at random, and picks up or moves or drops down the object according to picking-up or dropping probability with respect to the similarity of the current object within a local region. Finally, clusters are collected from the plane. This dissertation presents some improvements on the ant-based clusterings algorithm: different kinds of ant speed model is designed to more accord with ant moving behaviour instead of unique constant speed; the sigmoid function is used as the probability conversion function to quicken the speed of convergence since only one parameter needs to be adjusted in the sigmoid calculation; for the outliers processing the parameter is adjusted at the various stage of algorithm to speed up the algorithm convergence.(2) A novel combining algorithm based on ant-based clustering and ant colony optimization.Ant colony optimization algorithm inspired by cooperative foraging in ants. Clustering is a procedure that ant colony finds the shortest path from a food source to their nest if a cluster center is thought as the food source. Motivated by the behavior, a novel combining algorithm based on ant-based clustering and ant colony optimization is put forward here. The cluster centers are formed by the improved single ant-based clustering algorithm, and then optimized by the K-means using ant transition probability. Both ant-related algorithms are combined skillfully to improve clustering performance.(3) Research on the ant-based clustering algorithm using validity index that can find the optimal number of clusters and reduce outliers.Cluster analysis is an unsupervised learning technique that is hard to be evaluated its performance due to unknown category labels denoting a priori partition of the objects. Usually, there are three approaches to assess cluster validity: external criteria, internal criteria, and relative criteria. The external criteria approaches such as F-measure are based on a pre-specified structure. The internal criteria approaches use some quantities inherent in the data set to assess the result. The relative criteria evaluate among several results in terms of the same algorithm but with different parameter settings, e. g. cluster compactness and cluster proximity. In this dissertation, F-measure belonging to external criteria and a clustering validity index based on relative criteria are used to evaluate clustering quality, find the best number of clusters adaptively by a multi representative index, and reduce outliers simultaneously so as to solve the diffcult problem that it is necessary to input the number of clusters in advance in most clustering algorithms.(4) The ant-based clustering ensemble algorithm using hypergraph model and the multi-ant colonies algorithm and parallel implementation for clustering ensemble.Inspired by combining multiple classifiers, the ensemble of multiple clusterings can be viewed as finding a consensus partition from the output partitions of various clustering algorithms. This is a challenging task and proven to be NP-complete. The patterns are unlabeled and, therefore, there is no explicit correspondence between cluster labels in different partitions of an ensemble. An extra complexity arises when different partitions contain different numbers of clusters, often resulting in a label correspondence problem. This dissertation presents two novel kinds of the ant-based clustering algorithms. One work focuses on multiple ant colonies with different types of ants moving speed, each of which generates a clustering. And then these results are combined by a hypergraph model and re-clustered by the ant-based algorithm to the consensus clustering. Another algorithm is a parallel implementation of the multi-ant colonies and a queen ant agent. Both algorithms can improve the quality of the clustering evidently and suit for document collection.(5) Motivation of neural network ensemble to propose a clustering ensemble algorithm using Adaptive Resonance Theory.Adaptive Resonance Theory is an unsupervised learning neural network that self-organizes in response to input patterns to form a stable recognition cluster. The research presented here proposes to aggregate clustering results using ART network. In detail, clusterings from any clustering algorithms such as the ant-based algorithm are as input of ART neural network. After an unsupervised ART learning, the final target clustering is formed and clustering performance is better.(6) Developing a topic discovery and visualization system for Web document.It is a very challenging task to discover topic from document clustering results. In this dissertation, the topic can be extracted by re-computing terms weights according to the revealed cluster structure. Moreover, based on different novel clustering algorithms, a concrete application software was developed that operates as downloading Web documents, preprocessing, clustering analysis, topic discovery and clustering results visualization.
Keywords/Search Tags:data mining, computational intelligence, clustering analysis, clustering ensemble, ant-based clustering algorithm (ACA), adaptive resonance theory (ART)
PDF Full Text Request
Related items