Font Size: a A A

A Self-adaptive Density-based Clustering Algorithm For Discovering Density Varied Clusters

Posted on:2016-03-01Degree:MasterType:Thesis
Country:ChinaCandidate:J XieFull Text:PDF
GTID:2308330479984822Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data, people’s life, work and thinking is undergoing tremendous change. In the face of massive data, how to find the value of information and interesting knowledge becomes a very important and meaningful research. Data mining has played a very important role in various data analysis and knowledge discovery process.As an important data mining techniques, clustering analysis has been widely used in variety of data analysis applications. Clustering analysis recognize data distribution and interesting correlations among patterns of a dataset, then groups the dataset into several meaningful clusters which is composed of similar objects. Due to extensive use of clustering analysis, enhance the effect of clustering algorithm is necessary. As a well-known clustering algorithm, density-based clustering can find arbitrarily shaped clusters from noisy data via density extension. With the explosive growth of information, there are many kinds of various complex datasets. Then, how to improve the adaptability of datasets and the accuracy of clustering results becoming a meaningful and challenging research.DBSCAN, as a classical density-based clustering can find any arbitrarily shaped cluster in data set containing even noise and outliers. However, DBSCAN are known to have a number of problems such as:(a)It requires users input to specify parameter value;(b) it can’t deal with varying densities because of the adoption of global parameters;(c) it incurs certain computational complexity and the consumption of I/O. For the first two problems above, SADBSCAN-DLP(A self-adaptive multi-density DBSCAN based on Density Levels Partitioning) was proposed. The proposed algorithm is based on the combination of the density levels partitioning and CEI(cluster effect index) thought. SADBSCAN-DLP uses k nearest neighbor distance as density measurement to characterize density distribution of the dataset and get the KNN matrix. According to the value of CEI, it gets the Min Pts. Then, it figures out the density-level jump threshold according to statistics information of the density variation distribution, thus partitions the dataset into different density level sets by this threshold; and estimates Eps for each density level.set; for each value of Eps, DBSCAN algorithms is adopted to get local clustering results, combines local clustering results to get final clusters. Experimental results are obtained from UCI real datasets. The final results show that the proposed algorithm can get a good results respect to the original DBSCAN and DBSCAN-DLP algorithms.
Keywords/Search Tags:DBSCAN, CEI, Varied Density, Self-adaptive, Partitioning
PDF Full Text Request
Related items