Font Size: a A A

Research And Improvement Of Multi-density-based Clustering Algorithms

Posted on:2013-03-15Degree:MasterType:Thesis
Country:ChinaCandidate:S LiuFull Text:PDF
GTID:2248330371987130Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Clustering analysis, as a very important research topic of data mining, has been paid extensive attention by national and foreign scholars. Because of the advantages of clustering clusters of arbitrary shapes and being not affected by noise, the density-based clustering algorithms have been extensively researched and applied.This paper studies the classic density-based clustering algorithm, DBSCAN, and discusses the advantages and disadvantages of the algorithm. DBSCAN algorithm uses a global density parameter, so it cannot deal with the multi-density datasets. To address this issue, this paper proposes three improved algorithms.LODCMD algorithm (Local-Outlier-Degree-based Clustering algorithm for Multi-density Datasets):The algorithm introduces the concept of local-outlier-degree. It uses the average ratio of a point’s neighbor’s neighborhood density and the neighborhood density of the point to indicate the local-outlier-density. Because the neighborhoods of the points are different, the local-outlier-degree is dynamic changed. Then it can describe the different density distribution of the data points.RDCMD algorithm (Relative-Density-based Clustering algorithm for Multi-density Datasets): The algorithm introduces the concept of relative-density. It uses the ratio of a point’s density and its neighborhood density to indicate the relative density of the point. Due to the difference of the neighborhoods of the points, the relative-density is dynamic changed. Thus wise, the algorithm can adapt multi-density datasets.SCMDFC algorithm (Semi-supervised Clustering algorithm for Multi-density Datasets with Fewer Constraints):Semi-supervised clustering algorithms use a small amount of a priori information to supervise the clustering process so as to improve the clustering quality. This paper proposes SCMDFC algorithm based on SCMD algorithm. The algorithm extracts useful information from the must-link and cannot-link constraint sets sufficiently. Then, it picks up the parameter values which can reflect the density distribution of dataset. So it can find the cluster structures of different densities. This paper compared the three new algorithms with other density-based clustering algorithms. Experimental results show that the proposed algorithms have higher clustering quality when dealing with multi-density dataset.
Keywords/Search Tags:clustering, multi-density, local-outlier-degree, relative-density, semi-supervised
PDF Full Text Request
Related items