Font Size: a A A

Research On Spectral Clustering Algorithm Based On BIRCH And GAD

Posted on:2011-10-01Degree:MasterType:Thesis
Country:ChinaCandidate:G H DingFull Text:PDF
GTID:2178360305462518Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Spectral clustering can produce higher quality clusterings in small data sets than the traditional clustering algorithm (e.g.:k-means algorithm), but, due to its overall computational time complexity of O(n3), where n is the number of instances of data set, spectral clustering can not be used to very large data sets. Real-world data mining applications often involve huge data sets, which are placed in memory to do spectral clustering is not feasible. Many real-world data mining applications involve millions or billions of data records where even multiple scans of the entire data are too expensive to perform. This paper presents two improved spectral clustering algorithms (BIRCH-Based SPC& GAD-Based SPC) to reduce the consumption of computing resources and improve the computing speed. First of all, with the BIRCH or GAD algorithm to preprocess the original data sets, we can get some representative objects, and then, we have traditional spectral clustering applied to the representative data objects, and finally we recover all instances of their clustering according to core representative data objects. Our two improved spectral clustering algorithms use the limited computing resources (e.g.:memory, CPU) to maximize the accuracy of spectral clustering. Extensive experiments show that the proposed two algorithms can achieve significant speedups with little degradation in cluster accuracy. At a single ordinary PC, the two algorithms can cluster data sets with a million instances in just a few minutes of time.
Keywords/Search Tags:Spectral clustering, BIRCH, CF-Tree, GAD, Activity Detection
PDF Full Text Request
Related items