Font Size: a A A

The Application And Research Of Incremental Clustering On Temporal Data Streams

Posted on:2010-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:X Q ZhangFull Text:PDF
GTID:2178360272478962Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Cluster analysis is an important area of data mining. In recent years, with the high-speed development of computer technology, the ability to access to the data has greatly improved. There are more and more approach to access to data. Data stream, as a special source of data, has caused an increasing concern. There are many kinds of data steams, such as WEB clickstream, weather information, telephone records information, satellite data streams. Because the data stream has an unlimited amount of data, and you are not allowed to access the data several times, the traditional algorithms can't deal with the problem. We need to develop new algorithm to deal with the data stream. As a result, computer workers are facing new challenges.In this paper, temporal data streams have been studied. The concept and definition of data stream are given in the paper. At the same time, we propose a TMSC (temporal multiple-dimension subspaceα-cluster) clustering algorithm to find clustering based on a subspaceα-cluster. The TMSC algorithm uses sliding window to ensure that we don't need to deal with all the data at the same time. At the same time, there is a stage of maintain the algorithm which is called incremental stage. In the incremental stage, there is no need to recalute the old data. The new arrival data is the only part that we should concern. As a result, the incremental time are less than traditional algorithms. The last part of the paper is the application of the algorithm to stock data. We use different set of parameters to find a number of different clusters in the stock data. The results sense meaningful.The main innovations of TMSC algorithm include: 1) expanding from one-dimensional data steam to the multi-dimensional data stream; 2) improving the cluster pruning; 3) giving a clear definition from m-level to find the m +1- level clustering and proving it; 4) the original algorithm in the incremental update stage will miss clusters, we solve the prolem through reserve all the clusters.
Keywords/Search Tags:Data streams, Clustering analysis, Data mining, Temporal data streams, TMSC
PDF Full Text Request
Related items