Font Size: a A A

Research On Data Streams Clustering Methods

Posted on:2009-03-17Degree:MasterType:Thesis
Country:ChinaCandidate:C Y ChenFull Text:PDF
GTID:2178360272457234Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, as one of the important data mining tools, application field of clustering analysis is getting even wider increasingly. There is large data stream in more and more fields. Data stream is characterized by infinite data and quick stream speed, so traditional clustering algorithm cannot be applied to data steam clustering directly. How to efficiently clustering the data stream is a difficulty and hot topic by now.One of the difficulties in clustering data stream is that dataset is scanned limited times only, preferably one times. In view of above questions, probability-density based data steam clustering algorithm is proposed. It requires only newly arrived data, not the entire historical data, to be saved in memory. It applies EM algorithm on the newly arrived data and updates probability-density function by incremental Gaussian mixture model.Application of summary hierarchies to data stream clustering algorithm is proposed. Two summarization techniques based on wavelets and regression are proposed to maintain summary hierarchies. The regression-based hierarchy can be calculated more accurately, and the wavelet-based hierarchy can be built faster while using less storage space than the regression-based one.To validate above algorithmic capability, we have performed a series of experiments, which include simulated experiments and clustering for real data. Results show that running time is very fast. Running time has linear scalability with data dimensionality and cluster. The algorithm can cluster data set with noise, and clustering result is very good. The experiment results to real data show that the algorithm is effective to clustering data stream.
Keywords/Search Tags:data mining, clustering, data stream, probability density, summary hierarchies
PDF Full Text Request
Related items