Font Size: a A A

Study On A New Layered Clustering Algorithm

Posted on:2009-09-11Degree:MasterType:Thesis
Country:ChinaCandidate:D Q ZhouFull Text:PDF
GTID:2178360245489596Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data mining is a currently developed technology which attracts lots of researchers. Clustering is an important part of data mining .Most algorithms of clustering have to deal with the parameters: K-means algorithm demands user to input the clustering number. DBSCAN and other density algorithms need user to provide the density parameter. However most of the parameters are hard to be acquired.The thesis presents a new layered clustering algorithm to deal with this problem. The main idea is: Firstly, cluster the data with Unit Distance, it will generate many atom clusters. Then cluster the atom clusters with outlier point's number which may be acquired easily. After the analysis of cluster describing with represents, the thesis gives a new cluster describing algorithms: border represents algorithm which can be used in clustering large dataset.The main work of the thesis includes:(1) The thesis presents the concept of Unit Distance: In the data space, if the data objects is equably distributing, cluster them with the shortest distance of objects, there will be only one cluster. The shortest distance is called Unit Distance. Clustering data with Unit Distance, we will get the atom clusters.(2) The thesis presents the idea of Outlier Point optimizing: Outlier Points of clustering are deemed to be special data. We assume that, the probability of special data is fixed, thus the outlier points of data can be estimated by the data size and special data's probability. Generally this parameter is easier to be acquired than other parameters. The Outlier Point's expectation can be used to optimize the atom clusters.(3) Intergrating the ideas of Unit Distance and Outlier Point Optimizing, the thesis presents the UDBA (Unit Distance Based Algorithm). The theis lists the detail steps of the UDBA algorithm, and it analyses and compares the clustering result of UDBA algorithm with CHAMELON algorithm.(4) The thesis presents a translation method which can translate clusters of traditional algorithm into border representatines. The experiment explains the characteristic of border representatines: can represent- a complex cluster shape with a few representatines. This algorithm may help it to deal with the clustering of large data sets.
Keywords/Search Tags:Clustering Analysis, Unit Distance, Outlier Point Analysis, Represents, Border Represents
PDF Full Text Request
Related items