| Cluster analysis is a very active research area in data stream mining, it puts similar objectstogether and separates different objects according to the principle of maximizing the similaritywithin the class and minimizing the similarity between the different classes. A lot of clusteringalgorithms have been proposed to find the clustering patterns in different areas, but many ofthem re-start to cluster the whole data space when clustering requests arrive, which will increasetime complexity of the algorithm and are not well adapted to real time data stream’scharacteristics of high-speed flow and real-time response to user requirements. In real-time datastream, the characteristics of high-speed data stream flows and its huge amount of data requiredata stream mining algorithms to have faster processing speed and real-time response to theuser’s requirements.In order to speed up the process of clustering by taking advantage of the existing clusteringresults, we developed an incremental clustering algorithm based on grid and density-dimensionaldegree tree (PDStream) from the real-time data stream clustering algorithm which is based onattenuation window and dimensional tree (IGDStream). The algorithm combines the previousclustering result with the currently arrived data stream to do the clustering, this kindof incremental clustering is achieved by using the clustering result storing by densitydimensional tree and updating density dimensional tree according to grids’ density. This methodimproves the efficiency of mining by avoiding re-processing the entire data stream.Experiments on multiple datasets have proven the PDStream algorithm we proposed coulddiscover any shape in the data stream in the presence of noise, thus develops the clusteringability of the algorithm. |