Font Size: a A A

Grid-based Clustering Analysis And Visualization

Posted on:2009-04-18Degree:MasterType:Thesis
Country:ChinaCandidate:M M WeiFull Text:PDF
GTID:2178360272985901Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Data visualization is a new technique of data analysis and processing, which has become the key part in the research of data mining. Data visualization makes it easier for people to understand natural configuration in a large multivariate dataset and find out the relationship between data.Clustering is one of the most important parts of data visualization for discovering data distribution and underlying data pattern with unsupervised learning. This dissertation focuses on the development of effective grid-based cluster algorithm with validity measure and visualization. The major contributions of this dissertation are as follows:(1) A general grid-based clustering approach (GGCA) is proposed under a common assumption of hierarchical. The GGCA is a nonparametric algorithm which is capable of identifying ideal parameters for cluster to reveal a suitable cluster configuration, and exhibits excellent performance in dealing with not well separated and diverse shaped clusters.(2) By partitioning data space into a number of grids and ordering representative points to identify the clustering structure, a new semi-clustering method based on OPTICS algorithm is proposed. The new method has only linear complexity and is much faster than OPTICS.(3) Based on Gap statistic, a new method is proposed for estimating the number of clusters in a dataset. The technique uses the two-order difference of within cluster dispersion to replace the reference null distribution. Study shows that the realization of the Gap statistic becomes easier and its uncertainty in the applications is reduced.(4) In order to estimate the similarity between the centers of clusters, a new distance measure based on grid algorithm is defined to replace the traditional Euclidean distance. And a new method is proposed by using two-order difference and the new distance measure to two traditional clustering validity indexes. The method developed here is an effective amelioration of the original algorithm.
Keywords/Search Tags:Data Mining, clustering analysis, grid algorithm, visualization, clustering validity
PDF Full Text Request
Related items