Font Size: a A A

The Research Of Nonparametric Clustering Boundary Detection Algorithm

Posted on:2012-04-03Degree:MasterType:Thesis
Country:ChinaCandidate:M XuFull Text:PDF
GTID:2218330338957202Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the high development of productive forces, the information resources is rapidly expansion in today's society, which is a information society. It has already been the information industry's issue that how to find knowledge from large amount of data using non-trivial way. Data mining could find useful information and knowledge from the large dataset, so it has become a hot topic in the information age. Cluster analysis is one of the most important part in data mining, many of whose research results have been widely used in pattern recognition, data analysis, image processing, market research and other fields. The analysis of cluster boundary is a branch of Cluster's analysis, which is an important role in the field of the cluster analysis, image retrieval, virtual reality. Currently, the researches on the boundary points of clusters have been started. Those proposed algorithms have many problems: such as those algorithms require user to input the corresponding parameters accoding to the distribution characteristics of the datasets, and can't identify boundary points in noisy datasets containing clusters of different shapes and sizes effectively and efficiently and so on. In addition, the most of current clustering algorithms and the boundary detection algorithms is independent, and have not the integration of clustering and boundary detection.In order to detect boundary points of clusters automatically and effectively, and to eliminate the impact of parameters on the results of the boundary detection, we use the distribution of boundary points and the k-means clustering technique to automatically calculate the border threshold of the datasets. A new boundary detection algorithm NPRIM(nonparametric boundary detection algorithm based on Delaunay triangulation) is presented.For the problems that Boundary detection algorithms can't identify boundary points in noisy datasets containing clusters of different shapes and sizes, and have not the integration of clustering and boundary detection, this paper make best use of the advantages of the minimum spanning tree and triangulation can be the natural reaction of data points of distribution. The combination will present a new clustering based on minimum spanning tree boundary detection algorithm 2-MSTCRIM.This paper realized the algorithm NPRIM and 2-MSTCRIM, and has done a lot of experiments in the synthetic datasets and real datasets. It has compared them with the BORDER, BRIM and others boundary detection algorithms. The results show that: NPRIM and 2-MSTCRIM can identify boundary points in noisy datasets containing clusters of different shapes and sizes effectively and efficiently. Among them, the algorithm NPRIM doesn't need to enter any parameters, Algorithm 2-MSTCRIM in the short distance between the cluster and the cluster of multi-density data set has a higher detection accuracy, and it also has the Clustering function.
Keywords/Search Tags:data mining, Clustering, Boundary points, Parameter Automation, Multi-density, Triangulation, Minimum spanning tree
PDF Full Text Request
Related items