Font Size: a A A

Research And Implementation Of Data Stream Cluster Based On Density And Grid

Posted on:2010-11-06Degree:MasterType:Thesis
Country:ChinaCandidate:H K WangFull Text:PDF
GTID:2178360302460398Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of science and technology, the ability to acquire knowledge become more and more stronger. In recent years, as wireless sensor networks, routers and other equipment emergence, people has more ability to get data. And we have a new data model: data stream.This model deals with data no longer static and permanently stored in multiple. This model deals with data in a way that visit data in order and one or a limited times to visit. Now research on data stream mining mainly has cluster, classification and frequent pattern mining and so on. In this paper we research tradition clustering method based on density and grid. Found that, tradition clustering method based on density has to visit data many times. And tradition clustering method based on grid although visit data a time and could quickly process the data, but it has poor accuracy. So the tradition clustering method based on density and grid can not satisfy the requirements of data stream clustering.In this paper we combine of some of the traditional clustering algorithms, and an algorithm named DG-Tree(Density and Grid-Tree Algorithm) is proposed ,which is based on tree synopsis structure. There are two procedures in this algorithm, a micro-cluster component and a macro-cluster component. In the micro-cluster component part, it maps each input data record into a tree. It eliminates the effect impact on the clustering results by the empty grids. Because of data stream cluster often concern about newly data, we introduce the time decline model, which could find newly cluster information. The macro-cluster component computes the tree's leaf nodes density and clusters the leaf nodes based on the density. It could find noise nodes by setting noise density threshold function and reduce the computation by setting update cycle. Experiments on data set KDD cup 99 evaluation have shown that this algorithm is more efficiency than DBScan and CluStream.
Keywords/Search Tags:Data Stream, Cluster, Grid, Density
PDF Full Text Request
Related items