Font Size: a A A

Research On Density Data Stream Clustering Algorithm Based On Sliding Window

Posted on:2012-01-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y GongFull Text:PDF
GTID:2218330338470830Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years, the constantly improving standard of living and the rapid development of computer application technologies applied in various industries, People's ability to obtain data has been greatly improved, access to data sources also dramatically increased. With the application and development of information systems, many data exhibit its characteristics like "stream" in daily life. The tradition of data stored in their static structural forms no longer applicable. Data stream as a kind of important data sources has received more attention from the researchers. In different from traditional database, data stream has many peculiarities, The total amount of data presented infiniteness, and quick to get to the rate at which data is not controlled, and unpredictable, and data arrived in disorder. In view of these features of data stream, if we wish to the data in data analysis to be understanding and available, there is an urgent need to develop an efficient and accurate algorithm which applied to data stream.Academics have been made a great deal of research in the data stream clustering; many of good data stream algorithms have been raised. The chief method of data clustering are mostly based on division, based on level, based on the grid and based on the model. On the representative of the divided method based algorithm is kmeans and k center method. This algorithm classified data point to a recently key point, by calculating the distance, and continuously updated the distance of data points in the cluster to achieve a stable condition. This clustering method applied to rounder cluster and to medium or small database. In order to find complicated shape cluster and clustering large data set, we need to further expansion of the method based on division. Method based on division mainly from top to bottom and the top-down. To save expenditure, this method strict rule of procedure can't undo and also is a defect of this method. In order to solve such a round cluster based on the distance and filter Outlier, people raised the clustering method based on density. In his domain, if the number of points more than the given threshold, continue to compute until no point achieve in a given condition. Algorithm based on grid think space as a multidimensional grid, all data points mapping in the grid. It is no need to consider the concrete data, and only need to consider any grid point numbers or density, speed up gathered. Its disadvantage is that if the dimensions of the data point to increase, the time and space expenses of algorithm for increased dramatically. Algorithm based on model assumes a model for each clustered, and looks for the data to a given model is proposed to the best. The paper comprehensive research the problem of data stream and some classic clustering algorithms, and do some work under:(1) This article learned the merits of two-tier structure merits and raised an algorithm based on the slip window and density which named DStream.(2) Raised an algorithm framework based on sliding window and time recession.(3) With the experiment verify the validity of this algorithm.From the data set KDDCUP99 in the experiment can be seen, DStream algorithms effectively improve the accuracy of results and minimize the space and time expense. This article compared whit DStream, CluStream algorithms and the DStream algorithms which delete the part of compute time recession weight. DStream algorithms can get such results are superior to CluStream algorithms. This article measure the accuracy of cluster indicators is comparative SSQ value, compare time consumption of experiment sample with different number and different dimensions of the data sets.
Keywords/Search Tags:data stream, density, clustering, time recession, sliding window
PDF Full Text Request
Related items