The Application And Research Of Incremental Clustering On Temporal Data Streams

Posted on:2010-04-06

Degree:Master

Type:Thesis

Country:China

Candidate:X Q Zhang

Full Text:PDF

GTID:2178360272478962

Subject:Management Science and Engineering

Abstract/Summary:

PDF Full Text Request

Cluster analysis is an important area of data mining. In recent years, with the high-speed development of computer technology, the ability to access to the data has greatly improved. There are more and more approach to access to data. Data stream, as a special source of data, has caused an increasing concern. There are many kinds of data steams, such as WEB clickstream, weather information, telephone records information, satellite data streams. Because the data stream has an unlimited amount of data, and you are not allowed to access the data several times, the traditional algorithms can't deal with the problem. We need to develop new algorithm to deal with the data stream. As a result, computer workers are facing new challenges.In this paper, temporal data streams have been studied. The concept and definition of data stream are given in the paper. At the same time, we propose a TMSC (temporal multiple-dimension subspaceα-cluster) clustering algorithm to find clustering based on a subspaceα-cluster. The TMSC algorithm uses sliding window to ensure that we don't need to deal with all the data at the same time. At the same time, there is a stage of maintain the algorithm which is called incremental stage. In the incremental stage, there is no need to recalute the old data. The new arrival data is the only part that we should concern. As a result, the incremental time are less than traditional algorithms. The last part of the paper is the application of the algorithm to stock data. We use different set of parameters to find a number of different clusters in the stock data. The results sense meaningful.The main innovations of TMSC algorithm include: 1) expanding from one-dimensional data steam to the multi-dimensional data stream; 2) improving the cluster pruning; 3) giving a clear definition from m-level to find the m +1- level clustering and proving it; 4) the original algorithm in the incremental update stage will miss clusters, we solve the prolem through reserve all the clusters.

Keywords/Search Tags:

Data streams, Clustering analysis, Data mining, Temporal data streams, TMSC

PDF Full Text Request

Related items

1	The Research And Realization Of Clustering Algorithm In Data Streams Mining
2	Algorithms For Data Streams Based On Shielding/Summarizing
3	Study On Key Technologies Of Frequent Items Mining And Clustering On Data Streams
4	Research On Mining Algorithms Over Data Streams
5	Research And Implementation On Clustering Algorithms In Uncertain Data Streams Environment
6	Research On Technique And Application Of Mining Data Streams
7	Researchon Real-time Data Streams Clustering Framework
8	Optimal Data Streams Clustering Algorithm Based On N-δ Sliding Window Model
9	Research On Classification Technologies In Mining Unsteady Data Streams
10	Mining Association Rules In Data Streams