Font Size: a A A

Application Of Grid And Density Based Clustering Algorithm In Data Mining

Posted on:2006-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:S TaoFull Text:PDF
GTID:2168360155964868Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Clustering is one of the important research directions in Data Mining. It is a quantitative method studying how to classify the dataset with many key elements. The basic principle is to find out the close and distant relation between data samples quantitatively by math method with certain Similar or difference index based on their attributes, then cluster the data samples with this kind of relation.The grid and density based clustering algorithm studied here is different from other classic clustering algorithm such as k-means and k-medoids which are distance based. It adopt a kind of new thinking method with the ability of finding out clusters from data subspace which has high dimensions, the expanding ability of analyzing the dataset that exceeding the capacity of memory, and also give out the specification of the cluster in the form of DNF so that it can be easy to understand for users, and never make assumptions to assume that the data is accord with a certain kind of distribute model, and has nothing to do with the input order of the data. We got this conclusion according to the test, that is, grid and density based clustering algorithm can work better in finding out interesting knowledge subspace automatically from high dimensional data, then find out the accurate cluster from these subspaces.Grid and density based clustering algorithm has been applied to Medical science, Electronic communication, Insurance and so on, this subject applied this kind of clustering method into client clustering area in a telephone company of Electronic communication, classify their clients into different clusters by analyzing client information with this clustering method so that the company can make a better strategic based on the characteristic of different kind of client cluster, so that they can consolidate the frequent customer and develop new customer, thus expect to raise the enterprise's profit further.This subject also made some changes on data preparing and original algorithm realizing method, thus simplified the realization course of the original algorithm, raisedintelligibility of the algorithm in order to improve the usability and operational efficiency of the algorithm.In addition, this text also compared the clustering characteristic and clustering result between classic k-means clustering algorithm and grid and density based clustering algorithm, stressed the efficiency and intelligible advantage of this algorithm in clustering high dimensional and big dataset.
Keywords/Search Tags:Clustering, High dimensional dataset, Grid and density, Interesting subspace of knowledge
PDF Full Text Request
Related items