Font Size: a A A

Study Of Cluster Algorithm Of Data Stream

Posted on:2013-12-27Degree:MasterType:Thesis
Country:ChinaCandidate:P AnFull Text:PDF
GTID:2268330392969078Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Traditional data mining algorithms is used for simple, structured data. Thesedata are mostly solid. However, many data exists as the form of data stream. Datastream has the characteristics of massive, arrivalling uninterrupted and the changingrapidly. The characteristics makes traditional data mining algorithms can’t adapt thiskind of data. It is necessary to find some new data stream-based mining algorithm.At the same time, clustering is a very important aspect of algorithms on the problemof data stream mining. So, it is very important to research these methods. This paperis a data stream clustering algorithm.Many famous scholars have raised some new algorithms on data streamclustering, which had expanded and improved on the basis of the existing traditionalclustering methods. Some of algorithms have obtained a good effect of clustering.However, due to the inappropriate parameters setting or the inherent defeations ofthe algorithm, there are still some inadequacies in these algorithms.This algorithm is also extended from traditional algorithm to adapt data streamclustering. It studies the content data stream clustering algorithm based ongrid-density. It combined with the algorithms based on the density and grids. So ithas the advantages of the two methods of grid-based clustering, which has rapidspeed and high accuracy.This algorithm is based on D-Stream algorithm. The algorithm express well tothe advantage of D-Stream algorithm, and there is also some improvements on thebasis of the origina l algorithm. First, the algorithm changes some parameters inD-Stream to make them adapt the change of the grid dynamically. These parameterscan be set without knowledge. Setting parameters refers to the some other papers butused after optimized, simplifidand proved that the parameters are correct andefficiency. Then, in the offline part of clustering, the algorithm has three parts: thealgorithm based on disjoint-set, breadth-first and time gap. The algorithm based ondisjoint-set and breadth-first has a certain significance in engineering practice, andcan be a part of time gap algorithm. And the offline part also done someimprovements to optimize the original D-Stream algorithmat efficiency andprecision.Finally, there is a series of experiments on algorithm, which used the KDD99dataset. The experiment is from two aspects. First, obtain the best performance ofthe algorithm by adjusting the relative parameters. Second, prove the correctnessand effectivenessof the algorithm through experiments, by comparing with D-Stream algorithm and a the NDD-Stream algorithm.
Keywords/Search Tags:data mining, data stream, clustering, grid-density, decay window
PDF Full Text Request
Related items