Stream-based Clustering Algorithm

Posted on:2010-11-03

Degree:Master

Type:Thesis

Country:China

Candidate:J Zhang

Full Text:PDF

GTID:2208360278469504

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

In recent years, the technology of computer application develops with high speed, people have improved the ability of accessing and obtaining the data. As a important data source, data stream has got more and more attention, the clustering algorithms based on data stream have become an important topic.Different from traditional databases, data stream has the following characteristics: infinite scale of data, rapid arriving rate of data, and uncontrolled ability of tuples' arriving order. Because data stream has above characteristics, it is essential to advance a high-quality clustering algorithm to get accurate results.This paper presents an improved dual-tier data stream clustering algorithm named HSCS, which is divided into the fast-calculation layer and the accurate-calculation layer. The fast-calculation layer is the process which collects and pre-processes the data stream, it is the basis of the dual-tier data stream clustering algorithm. In the fast-calculation layer, the algorithm uses the idea of equal-time span sliding windows. It uses hash function to sample the datas in the sliding windows and then deals with them to get the abstract information of data stream, and input the abstract information into the accurate-calculation layer. The accurate-calculation layer is the offline analysis part of the dual-tier clustering algorithm, it have more freedom to get accurate clustering results with different methods. In the accurate-calculation layer, we use the sampled datas from the fast-calculation layer as data source. In order to get a better final result, we use DBSCAN, which is a density-based clustering algorithm, to deal with the datas.The experimental results gained from the real data sets show that the algorithm is able to reflect the overall distribution of data stream through the sampling of data analysis, but also can reduce the algorithm's storage requirements, and it has a good feasibility and effectiveness.

Keywords/Search Tags:

Data stream, Sliding window, Clustering algorithm

PDF Full Text Request

Related items

1	Research On Data Stream Clustering Algorithm Based On Sliding Windows And Subspace Partition
2	Research On Density Data Stream Clustering Algorithm Based On Sliding Window
3	Based On Sliding Window And The Grid Density Data Stream Clustering Algorithm Research
4	Research On Uncertain Data Stream Clustering Method Based On Variable Sliding Window
5	Data Stream Processing Algorithm Based On Cluster Analysis
6	Research On Fuzzy Clustering Algorithm For Data Stream
7	Stream-based Clustering Algorithm
8	Research On Density-based Subspace Clustering Algorithm For Data Streams
9	Research On Density-Based Subspace Clustering Algorithm For Data Streams
10	Research On Clustering Of Stream Data Based On Sliding Window