Font Size: a A A

Research Of Data Stream Clustering Methods Based On Density

Posted on:2018-03-31Degree:MasterType:Thesis
Country:ChinaCandidate:S W LiFull Text:PDF
GTID:2348330518998939Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the popularity of the network and the large-scale application of the sensor,the research of data stream clustering algorithms has attracted more and more attention.Data stream has the characteristics of continuty,real time and chronologically arrival to which the traditional clustering methods can not be applicable.The researchers have done a lot of research works on clustering for data stream and proposed some clustering methods for data stream.However,because of the complexity and diversity of data steam,the performance of these algorithms needs to be improved to meet the new conditions and demands.The shortcomings of the existing algorithms are as follows: the accuracy of clustering results is not high enough,the clustering accuracy of multidimensional data is low,and the clustering difficulties in distributed environment are difficult.In this paper,we study the clustering algorithms for data stream.This paper aims at designing effective and efficient density-based clustering algorithms for data stream.The main work is as follows:1.A new data stream clustering algorithm based on star grid is proposed(GDH-Stream).Firstly,traditional data clustering algorithm based on density used to ignore the spatial distribution of data,which can leads to the inaccurate clustering.To overcome this shortcoming,we propose the strategy of using spatial distribution of data points to improve the accuracy of clustering results.Secondly,to overcome the inefficiency of the grid clustering method caused by the increase of the number of meshes in the multi-dimensional space,we design a new star structure suitable for the grid,which can reduce the number of nodes in the grid list and improve the time performance of the algorithm.Thirdly,we improve the the structure of the micro-cluster feature tree,in order to reduce the number of times of the algorithm to scan the data set,thus the efficiency of the algorithm is improved.Finally,we analyze the time and space complexity of the proposed algorithm and validate the effectiveness of the proposed algorithm through a series of experiments.2.A new distributed data stream clustering algorithm based on improved mesh density is proposed(DGDH-Stream).Firstly,the data stream clustering in distributed environment is divided into local clustering and global clustering,and a distributed data stream clustering model is proposed to enhance the expansibility of the algorithm.Secondly,we design a new compression technique for micro-clusters,which can compress the micro-clusters generated in local clustering to reduce the load of distributed system.Finally,we analyze the time and space complexity of the proposed algorithm.The DGDH-Stream algorithm is compared with the state of the art algorithm DBDC on the performance and clustering quality.The results show that the algorithm proposed in this paper is superior to the contrast algorithm.
Keywords/Search Tags:Data mining, Data stream, Density-based, Distributed system
PDF Full Text Request
Related items