Font Size: a A A

The Study And The Application Of Clustering Algorithm Based On Manifold Distance

Posted on:2011-02-15Degree:MasterType:Thesis
Country:ChinaCandidate:M MaFull Text:PDF
GTID:2178360305464239Subject:Intelligent information processing
Abstract/Summary:PDF Full Text Request
Cluster analysis is a data reduction technique that groups together either variables or cases based on similar data characteristics. It is an effective tool of data mining and is attracting wide attention. Commencing with the dissimilarity measure and the sensitivity to initial centers of cluster algorithms, we substitute a novel manifold dissimilarity measure for the traditional Euclidean distance. We select the initial cluster centers using a global method. And two schemes are proposed to solve the increasing computational cost brought by manifold distance. Then, the two new methods are applied to cluster analysis and image segmentation. The main contributions can be summarized as follows:In this paper, we propose a manifold clustering algorithm, named Global Prototypical Clustering Algorithm based-on Manifold Distance(GPMC). In GPMC, the cluster representatives are chosen from data set itself, and each data item is assigned to a cluster representative according to a novel manifold dissimilarity measure which can measure the geodesic distance along the manifold. GPMC selects the initial cluster centers from an optimization viewpoint using a global method. We apply GPMC to solve some benchmark clustering problems on artificial data sets and UCI data sets. The experimental results show that in terms of cluster quality and robustness, GPMC has the ability to identify complex and non-convex clusters.In section 3, a Two-Phase Clustering(TPC) for the data sets with complex distribution is proposed. TPC contains two phases: firstly, we partition the data set into some sub-clusters with spherical distribution, and make each clustering center the representative of corresponding cluster. Then, utilizing MEC's outstanding clustering performance for complex distributed data, the clustering centers obtained in the first phase is clustered. Finally, combining of these two clustering results, final results are obtained. This algorithm is based on a meliorated K-Means and the Manifold Evolutionary Clustering(MEC). Manifold distance is introduced in evolutionary clustering to make the algorithm competent for clustering complex data sets. At the same time, novel method reduces the computational cost brought by manifold distance. Experimental results on two groups of data sets with different structure show that the novel algorithm has the ability to identify clusters with no matter simple or complex, convex or non-convex distribution efficiently. In section 4, combining with morphological method, a novel image segmentation algorithm named Global Prototypical Clustering Algorithm based-on Watershed and Manifold Distance (WGPMC), is proposed. An improved watershed algorithm, marker driven watershed transform was used to segment image into many small regions roughly. Then, the image characteristics of every small region are calculated. Do the secondary segmentation with GPMC. Finally, based on these two segmentation results, final results are obtained. The new algorithm is used to solve different image segmentation tasks, including synthetic aperture radar image and natural image. The experimental results show that WGPMC is competent for segmenting multiple images with high quality.
Keywords/Search Tags:Clustering, Manifold, Evolutionary Algorithms, Image Segmentation, Watershed Algorithm
PDF Full Text Request
Related items