Font Size: a A A

Research On Partition-Based Online Clustering Algorithms

Posted on:2013-11-22Degree:MasterType:Thesis
Country:ChinaCandidate:X Y WuFull Text:PDF
GTID:2298330422479918Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of the information technology, the data stream has become a commonlyused model. Because of the characteristics of data stream, traditional clustering algorithms have met alot of challenges. In addition, a series of data stream clustering algorithm have been proposed, butthere are still many shortcomings, such as when the cluster boundary is not linearly separable or thedata dimension is too high, the data stream clustering algorithms based on partition methods achievelow accuracy results. So this paper proposes two online partition clustering algorithms based onkernel methods and feature selection, respectively. The details are as follows:Firstly, a new online kernel fuzzy c-means (OKFCM) algorithm for large scale datasets based onkernel fuzzy c-means (KFCM) is proposed. OKFCM not only inherits the robust clustering propertiesof the original KFCM, but also suitable for clustering data streams. In addition, taking into accountthe difficulties in selecting kernel parameters, a new online multiple kernel fuzzy C-means algorithm(OMKFCM) by combining multiple kernels with different parameters based on multiple kernellearning methods is also derived. Thus, the new online kernel algorithms effectively mitigate theproblem of selecting kernel parameters and to some extent also has the advantages of ensembleclustering.Secondly, a new online local adaptive fuzzy C-means (OLAFCM) for large high dimensionaldatasets based on fuzzy C-means (FCM) is proposed. OLAFCM discovers clusters spanned by differentcombinations of dimensions via local weightings of features assigned to each cluster, so it caneffectively mitigate "the curse of dimensionality" problem in high-dimensional data streams. Besides,taking into account the calss number initialization dependends on the domain knowledge, thecompetitive agglomeration-based online local adaptive fuzzy C-means (OLAFCM_CA) is also derived.The new proposed algorithms can not only alleviate the sensitivity of class number initialization, butalso have better performance on artificial and real datasets compared to state-of-the-art partitionclustering algorithms based on global dimensionality reduction methods.
Keywords/Search Tags:kernel methods, feature selection, online kernel fuzzy C-means, online multiplekernel fuzzy C-means, online local adaptive fuzzy C-means, competitive agglomeration-based onlinelocal adaptive fuzzy C-means
PDF Full Text Request
Related items