Font Size: a A A

The Research And Improvement Of Density-based Clustering Algorithm

Posted on:2014-01-09Degree:MasterType:Thesis
Country:ChinaCandidate:M X QianFull Text:PDF
GTID:2308330461473966Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
After several decades of developments, the application of computer technology has been all-encompassing. However, the growth of applications brings huge difficulties to data analysis. Data mining can find out the implicit rules in the data while analyzing and processing data, which providing a powerful weapon for human to get the most out of data. Therefore, it is important to human society to fully utilize the tool of data mining. As one of the methods of data mining, clustering analysis classify data by unsupervised learning. It has become a hot research topic all over the world, and been widely used in many areas. However, as practical problems are more and more complex, people demand for higher performance of clustering analysis. So the study of clustering analysis faces new challenges. Density based clustering algorithm is one of the important methods of clustering analysis, and it can find arbitrary shape clusters by the concept of density. But there are many shortcomings in the density based clustering algorithm, such as parameter sensitivity and cannot deal with datasets with multi-density distribution efficiently. Therefore, density based clustering algorithms need further research.Based on the study of density based clustering algorithms, the paper analyzes and discusses the shortcomings of them, and proposes two improved algorithm. The proposed algorithms solve the problems of density-based clustering algorithms before.First, for the shortcomings of traditional density based clustering algorithms that have the difficulty to set the parameters, and the parameters are single and global ones, a parameter free multi-density clustering algorithm based on one-dimensional projection analysis (PFMDBSCAN) is proposed. PFMDBSCAN first makes one-dimensional projection, and then calculates the kernel density estimation of projective data to search the dense data partitions. At last, it gets the density parameters for every partition to achieve the parameter free multi-density clustering algorithm.Second, the paper analyses the properties of minimum spanning tree which can extend according to the shape of datasets and describe the density of different data clusters by edge-weights. We introduce the idea of minimum spanning tree into the density based clustering algorithms, and propose a parameter free multi-density clustering algorithm based on minimum spanning tree(MST-DBSCAN). MST-DBSCAN first constructs the minimum spanning tree of data and preserves the edges set. Then distinguish the different clusters by analyzing the distributed situation of edges. At last, select the representative points by the information of edges to estimate the density of different clusters and find multi-density clusters.The paper respectively analyzes and compares experiments on the two proposed improved algorithm. The results show that both of them can find clusters with multi-density and arbitrary shape and achieve improved effects.
Keywords/Search Tags:density-based clustering, multi-density, projection, kernel density estimation, minimum spanning tree
PDF Full Text Request
Related items