Font Size: a A A

Research On Data Stream Clustering Based On Grid And Density

Posted on:2013-11-28Degree:MasterType:Thesis
Country:ChinaCandidate:X P LiFull Text:PDF
GTID:2248330392957824Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Currently, as network intrusion detection, real-time monitoring system, and user’sclicking stream data on the web, etc continuously generate time-bounded, large scale,fast-changing and infinite data stream, it is very important and useful to research the areaof data mining for data stream. Clustering as a very important issue in data mining area,has been widely studied right now. But the model of data stream is not equal to thetraditional data set, new demands and challenge generate.This paper studied traditional clustering methods, finding that existing data streamclustering algorithm like CluStream is based on k-means algorithm. Those clusteringalgorithms are not suitable to find clusters of any shape, and can’t handle exception data.Furthermore, they need the value of k and user-specified time window. But clusteringmethod based on grid and density has many features to be used to data stream handling, itis easy to realize data stream clustering. Thus, this paper studies traditional algorithmsbased on density, and proposes GDCLUS, considering the dynamic feature of data sets.This algorithm uses online component to map every input data record to one grid, butoffline component clusters grid using the method of minimum spinning tree. Thisalgorithm uses density decay technique to capture the dynamic change of data stream. Todiscover the relationship between decay factor, data density and clustering structure,thisalgorithm can effectively generate and adjust clusters. Furthermore, we use the improvedtime framework to choose data online to improve space and time efficiency of clustering,this technique makes data stream clustering more feasible on the premise of not reducingthe quality of clustering. The experiment result shows it has great quality and efficiency, itcan discover clusters of any shape and detect evolving feature of data stream.At last, this paper tests relative functions of this algorithm on a real data streamapplication area, and conducts experiments on KDDCup99used for web intrusiondetection, improving the feasibility of this algorithm.
Keywords/Search Tags:data stream, cluster, web intrusion detection, minimum spanning tree
PDF Full Text Request
Related items