Application Of Grid And Density Based Clustering Algorithm In Data Mining

Posted on:2006-05-23

Degree:Master

Type:Thesis

Country:China

Candidate:S Tao

Full Text:PDF

GTID:2168360155964868

Subject:Management Science and Engineering

Abstract/Summary:

Clustering is one of the important research directions in Data Mining. It is a quantitative method studying how to classify the dataset with many key elements. The basic principle is to find out the close and distant relation between data samples quantitatively by math method with certain Similar or difference index based on their attributes, then cluster the data samples with this kind of relation.The grid and density based clustering algorithm studied here is different from other classic clustering algorithm such as k-means and k-medoids which are distance based. It adopt a kind of new thinking method with the ability of finding out clusters from data subspace which has high dimensions, the expanding ability of analyzing the dataset that exceeding the capacity of memory, and also give out the specification of the cluster in the form of DNF so that it can be easy to understand for users, and never make assumptions to assume that the data is accord with a certain kind of distribute model, and has nothing to do with the input order of the data. We got this conclusion according to the test, that is, grid and density based clustering algorithm can work better in finding out interesting knowledge subspace automatically from high dimensional data, then find out the accurate cluster from these subspaces.Grid and density based clustering algorithm has been applied to Medical science, Electronic communication, Insurance and so on, this subject applied this kind of clustering method into client clustering area in a telephone company of Electronic communication, classify their clients into different clusters by analyzing client information with this clustering method so that the company can make a better strategic based on the characteristic of different kind of client cluster, so that they can consolidate the frequent customer and develop new customer, thus expect to raise the enterprise's profit further.This subject also made some changes on data preparing and original algorithm realizing method, thus simplified the realization course of the original algorithm, raisedintelligibility of the algorithm in order to improve the usability and operational efficiency of the algorithm.In addition, this text also compared the clustering characteristic and clustering result between classic k-means clustering algorithm and grid and density based clustering algorithm, stressed the efficiency and intelligible advantage of this algorithm in clustering high dimensional and big dataset.

Keywords/Search Tags:

Clustering, High dimensional dataset, Grid and density, Interesting subspace of knowledge

Related items

1	Research Of Subspace-clustering Algorithms Based On Density Over High-dimensional Data
2	Research On Subspace Clustering Algorithms For High-dimensional Data
3	ESCHCD: Entropy-based Algorithm For Subspace Clustering With High Dimensional Categorical Datasets
4	Research On Subspace Clustering Algorithms Based On Density
5	Semi-supervised Subspace Clustering Based On Space-level Constraint
6	Research On Clustering Method Of Datastream Based On Grid And Density
7	Research On High Dimensional Data Clustering Algorithm Based On Subspace And Density Peak
8	Research On Clustering Algorithm Based On Irregular Grid And Subspace Of Descending Dimension
9	Application And Research On Clustering Algorithm In Large Scale High Dimensional Datasets
10	Research On Clusrering Algorithm Of High Dimensional Data