Font Size: a A A

A Density And Information Grid Based Algorithm For Clustering

Posted on:2013-02-12Degree:MasterType:Thesis
Country:ChinaCandidate:Z Z ZhaoFull Text:PDF
GTID:2218330362963620Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As the computer is popular in daily life and production, great amounts ofdata of information is collected, and it has been growing day by day. Accordingto the statistics, the data produced by the internet per day is as much as16,800DVD discs contains;2million blog articles are released; over2.5million pieces of photos are uploaded to the Facebook. And it's growing day byday. There must be massive information and knowledge hiden behind thehuge amounts of data. So it's much important to find a way to refine the hideninformation from the data. Data mining is one of these ways, which refinesinformation and knowledge to help life and production from the datapreprocessed by ETL utility by extraction, transforming and loading.Clustering analyse, which is of unsupervised learning, is a kind of datamining analyse activity. It's said that feathers flock together. Clustering analyseis just using the natural feature. It clusters data similar to each other togetherby some rules. Clustering analyse includes partitioning, hierarchical methods,density-based methods, grid-based methods, model-based methods.This paper firstly describes the basic theory of data mining and made ageneral introduction to algorithms on clustering. On this basis, the combinationmethod of density-based and grid-based method, which is one of the clusteringresearch focuses, is described in detailed in this paper. It's found that densitybased clustering algorithms can find any shaped clusters, but they are pool onperformance. Grid based algorithms is excellent on performance, buy theyusually fail to identify the noise data in border grids. So the algorithmDDBGC(Density diffusion based Grid Clustering) is proposed according to thefeatures of density-based and grid-based algorithms. DDBGC algorithm is atwo-phrase algorithm. It firstly gets elementary clusters by a grid-basedalgorithm, and then uses the density-based idea to refine the clusters. Thedensity diffusion grid-based clustering method is concise and efficient. It solves the noise identify problem of traditional grid-based algorithms.Based on Weka3.6platform, the algorithm proposed in this paper isimplemented. With the help of the Weka platform, by providing some existingfeatures such as reading data, displaying data, etc., a complete clustering isexperimented. Experiments show that the proposed clustering method not onlyhas good performance, but also solve the noise identify problem.
Keywords/Search Tags:Data Mining, STING, Seed filling, Density Diffusibility, Grid‐basedClustering
PDF Full Text Request
Related items