Font Size: a A A

Research Of Subspace-clustering Algorithms Based On Density Over High-dimensional Data

Posted on:2013-03-07Degree:MasterType:Thesis
Country:ChinaCandidate:L Y MiaoFull Text:PDF
GTID:2248330392454853Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In data mining field, the clustering analysis has become an important topic. Theexisting clustering algorithms based on density have the capability of identifying clustersof arbitrary shape in a different subspace, but there are still many issues to be resolved.The cluster quality of most existing clustering algorithms based on density is directlyinfluenced by the density selection and unable to solve the high-dimensional data. In orderto improve the above shortcomings, this paper mainly focuses on subspace clusteringalgorithms based on density, which are important clustering analysis problems with broadapplications, including software vulnerabilities mining, network security and wirelesssensor.First, this paper proposes a clustering method based on density andCluster-Transaction tree (CT-tree) for high-dimensional data. The algorithm creates aCT-tree by one-dimensional clusters in each dimension and the data objects are stored inthe corresponding node. The next we traverse the tree to find these continuous and discretepaths which constituting the sets of candidate clusters. In order to improve the clusteringquality, a new local density threshold technique is presented to determine the subspaceclusters which are non uniformity of density and the final clustering results are obtained.Second, a clustering method based on the grid density and attribute relativity forhigh-dimensional data stream is presented. The algorithm maps each data object into agrid and updates the characteristic vector of the grid. When a clustering request arrives,the best interesting subspaces will be generated by a weighted attribute relativity measure.Then the original grid structure is projected to the subspace and a new grid structure isformed. The clustering will be performed on the new grid structure by adopting anapproach based on the density grid.At last, we program the two algorithms mentioned above. And detail analysis ofquality of clustering and scalability are given. Further more, we discuss the output qualityof clustering of the two algorithms.Experiments demonstrate that the two algorithms this paper proposed obtain better quality and scalability. We achieve the expectant goal. In addition, this paper also analysesthe existing problems.
Keywords/Search Tags:subspace clustering, high-dimensional data, density, CT-tree, grid, weightedattribute relativity
PDF Full Text Request
Related items