Study On A New Layered Clustering Algorithm

Posted on:2009-09-11

Degree:Master

Type:Thesis

Country:China

Candidate:D Q Zhou

Full Text:PDF

GTID:2178360245489596

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Data mining is a currently developed technology which attracts lots of researchers. Clustering is an important part of data mining .Most algorithms of clustering have to deal with the parameters: K-means algorithm demands user to input the clustering number. DBSCAN and other density algorithms need user to provide the density parameter. However most of the parameters are hard to be acquired.The thesis presents a new layered clustering algorithm to deal with this problem. The main idea is: Firstly, cluster the data with Unit Distance, it will generate many atom clusters. Then cluster the atom clusters with outlier point's number which may be acquired easily. After the analysis of cluster describing with represents, the thesis gives a new cluster describing algorithms: border represents algorithm which can be used in clustering large dataset.The main work of the thesis includes:(1) The thesis presents the concept of Unit Distance: In the data space, if the data objects is equably distributing, cluster them with the shortest distance of objects, there will be only one cluster. The shortest distance is called Unit Distance. Clustering data with Unit Distance, we will get the atom clusters.(2) The thesis presents the idea of Outlier Point optimizing: Outlier Points of clustering are deemed to be special data. We assume that, the probability of special data is fixed, thus the outlier points of data can be estimated by the data size and special data's probability. Generally this parameter is easier to be acquired than other parameters. The Outlier Point's expectation can be used to optimize the atom clusters.(3) Intergrating the ideas of Unit Distance and Outlier Point Optimizing, the thesis presents the UDBA (Unit Distance Based Algorithm). The theis lists the detail steps of the UDBA algorithm, and it analyses and compares the clustering result of UDBA algorithm with CHAMELON algorithm.(4) The thesis presents a translation method which can translate clusters of traditional algorithm into border representatines. The experiment explains the characteristic of border representatines: can represent- a complex cluster shape with a few representatines. This algorithm may help it to deal with the clustering of large data sets.

Keywords/Search Tags:

Clustering Analysis, Unit Distance, Outlier Point Analysis, Represents, Border Represents

PDF Full Text Request

Related items

1	A Study Of Sparse Subspace Clustering For Image Sequence And Its Applications
2	Research And Analysis On Distance-based Outlier Detection
3	The Study, Distance-based Clustering And Outlier Detection
4	Research On Outlier Detection Based On Density Difference
5	Research On Outlier Detection For Reconstructed Point Clouds Based On Images
6	Study On An Analysis Method For Cluster-based Outlier
7	Build, Based On Unit Outlier Algorithm And Customer Loyalty Analysis System
8	The Research On Clustering Algorithm For Categorical Data Based-on Rough Set
9	Analysis And Research Of Distance-Based Outliers
10	Study On The Algorithms Of Clustering And Outlier Detection Based On Neighborhood