Font Size: a A A

Study On Improvement To Density-Based Clustering Algorithm

Posted on:2008-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:S GaoFull Text:PDF
GTID:2178360242967269Subject:Software engineering
Abstract/Summary:PDF Full Text Request
So far, many cluster algorithms have been proposed. These algorithms have been applied in various fields widely, such as data mining, pattern recognition, data analysis, image processing, spatial databases, biology, market investigation, and so on. These applications need satisfy many restrictions for generating genuine clusters. It is a challenging thing to find a cluster, which satisfies special restriction and has good quality.Density-based clustering algorithms can discover arbitrary shaped cluster, identify noise, and are insensitive to the inputting order of data objects with good flexibility, which have been applied in various fields weightily. However, most of them are sensitive to the parameters, and not effective in handling various density dataset. These shortcomings limit the application of density-based algorithm in an extent. Therefore, how to resolve the problem of clustering in various density dataset, as well as reducing the sensitivity to the parameters of density-based algorithm is an open issue to be considered.A novel Density-Tag Based Clustering algorithm, for short DTBC, was proposed in the paper. DTBC presents the concept of density-tag, which marks dataset density distribution information. At first DTBC uses k-nearest neighbor method to build sub-clusters, followed by analyzing the density distribution of sub-clusters. According to the density distribution, DTBC marks sub-clusters with corresponding density-tags. Thus it obtains the density distribution of entire dataset. Finally, it generates genuine clusters according to the density-tags. The main advantage of DTBC is the effectiveness of handling various density dataset and the insensitivity to the mere parameter to be needed. The experiment results suggest that DTBC is more suitable and insensitive to discover clusters in various dataset than DBSCAN and KNNCLUST.
Keywords/Search Tags:Clustering, K-nearest Neighbor, Density-Tag
PDF Full Text Request
Related items