Study Of Cluster Algorithm Of Data Stream

Posted on:2013-12-27

Degree:Master

Type:Thesis

Country:China

Candidate:P An

Full Text:PDF

GTID:2268330392969078

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Traditional data mining algorithms is used for simple, structured data. Thesedata are mostly solid. However, many data exists as the form of data stream. Datastream has the characteristics of massive, arrivalling uninterrupted and the changingrapidly. The characteristics makes traditional data mining algorithms can’t adapt thiskind of data. It is necessary to find some new data stream-based mining algorithm.At the same time, clustering is a very important aspect of algorithms on the problemof data stream mining. So, it is very important to research these methods. This paperis a data stream clustering algorithm.Many famous scholars have raised some new algorithms on data streamclustering, which had expanded and improved on the basis of the existing traditionalclustering methods. Some of algorithms have obtained a good effect of clustering.However, due to the inappropriate parameters setting or the inherent defeations ofthe algorithm, there are still some inadequacies in these algorithms.This algorithm is also extended from traditional algorithm to adapt data streamclustering. It studies the content data stream clustering algorithm based ongrid-density. It combined with the algorithms based on the density and grids. So ithas the advantages of the two methods of grid-based clustering, which has rapidspeed and high accuracy.This algorithm is based on D-Stream algorithm. The algorithm express well tothe advantage of D-Stream algorithm, and there is also some improvements on thebasis of the origina l algorithm. First, the algorithm changes some parameters inD-Stream to make them adapt the change of the grid dynamically. These parameterscan be set without knowledge. Setting parameters refers to the some other papers butused after optimized, simplifidand proved that the parameters are correct andefficiency. Then, in the offline part of clustering, the algorithm has three parts: thealgorithm based on disjoint-set, breadth-first and time gap. The algorithm based ondisjoint-set and breadth-first has a certain significance in engineering practice, andcan be a part of time gap algorithm. And the offline part also done someimprovements to optimize the original D-Stream algorithmat efficiency andprecision.Finally, there is a series of experiments on algorithm, which used the KDD99dataset. The experiment is from two aspects. First, obtain the best performance ofthe algorithm by adjusting the relative parameters. Second, prove the correctnessand effectivenessof the algorithm through experiments, by comparing with D-Stream algorithm and a the NDD-Stream algorithm.

Keywords/Search Tags:

data mining, data stream, clustering, grid-density, decay window

PDF Full Text Request

Related items

1	Adaptive Evolving Data Stream Algorithm Based On Time Decay Window
2	Research On Data Stream Clustering Algorithm Based On Density Grid
3	Based On Sliding Window And The Grid Density Data Stream Clustering Algorithm Research
4	Research On Data Stream Clustering Algorithm Based On Density Grid Over Sliding Window
5	Research On Dynamic Measurement Based Data Stream Clustering And Its Applications
6	The Research Of Clustering Algorithm Based On Data Stream
7	Research On Data Stream Clustering Algorithm Based On Sliding Windows And Subspace Partition
8	Research On Data Stream Clustering Algorithm Based On Grid And Density
9	Research On Density Data Stream Clustering Algorithm Based On Sliding Window
10	Research On Fuzzy Clustering Algorithm For Data Stream