Font Size: a A A

Cluster Analysis And Its Application On Image Processing

Posted on:2013-08-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y XiaoFull Text:PDF
GTID:1228330395967935Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As an unsupervised learning method, cluster analysis is one of the most important research fields in machine learning. In recent years, data clustering is under vigorous development and cluster analysis has been successfully used in numerous applications, including image processing, text data mining, market research and Bioinformatics.In this dissertation, we focus on two key problems of cluster analysis:similarity measure and the design and application of new clustering algorithms. The goal of clus-tering is to discover similar clusters, and therefore how to define and compute similarity is very crucial for clustering. Based on the existed Gaussian kernel similarity function, we propose a new similarity model. Besides the similarity model, the effect of the used features in similarity measure is also discussed and the intrinsic dimension is introduced as a new feature to improve the similarity measure. According to different clustering problems, designing fast and effective clustering algorithm is very necessary. We give a discussion about the advantages and disadvantages of the existed clustering methods, and propose a fast clustering algorithms, which is applicable for image segmentation. Consid-ering that most images contain noise in reality, in order to reduce the effect of the noise on both image segmentation and other subsequent image analysis, a sparse representation-based denoising algorithm is proposed for mixed noise removal.The main contributions of this dissertation are as follows:(1) A weighted self adaptive Gaussian kernel similarity measure is proposed. The tra-ditional Gaussian kernel similarity measure is suitable for the data set containing clusters with similar density, and moreover it is not robust enough against outliers in the data. Considering that there usually exist outliers and clusters with differ-ent densities in real data sets, we propose a new robust Gaussian kernel similarity measure. Based on the existed self adaptive Gaussian kernel similarity measure, the new similarity measure assigns a weight for each data point according to its neighbor information, and the aim of which is to reduce the similarities between outliers and other points via assigning small weights for outliers. Experimental results show that the proposed similarity measure gives better description of both intra-similarities and extra-similarities, leading to better clustering results.(2) We present a novel similarity measure based on intrinsic dimension. Similarity measure is dependent on not only similarity model but also data features. Each cluster can be considered as a sub-manifold, and the data points can be partitioned via defining a new feature reflecting the topology structure of manifolds. In some cases, intrinsic dimension can be used for distinguishing different manifolds, since the data points in the same cluster are expected to have the same intrinsic dimen-sion while data points with different intrinsic dimensions should lie in different manifolds. Based on its neighbor information, the intrinsic dimension of each data point is estimated and used as a new feature for similarity computation with the traditional features. Experimental results show that the clustering results gained by the new similarity measure are better than the results based on the similarity using only intrinsic dimension or original features.(3) For data sets with complex structure, it is very difficult to get satisfactory cluster-ing results via adjusting the similarity matrix using unsupervised method. Semi-supervised clustering employs limited amounts of labeled data to guide the cluster-ing process, which can gain better clustering results. In this dissertation, a semi-supervised clustering method based on affinity propagation algorithm is proposed. The affinity propagation algorithm is a similarity matrix based clustering algorithm, and its performance can be improved via adjusting the similarity matrix according to some known labeled data or pairwise constraints. The experimental results show that the semi-supervised affinity propagation method can improve the clustering accuracy over the unsupervised affinity propagation algorithm by adding a small number of pairwise constraints.(4) A novel method for data clustering is presented based on Wittgenstein’s family resemblance. The existed clustering algorithms based on similarity matrix either have high time complexity or need to tune some parameters. The new algorithm constructs an adjacency matrix based on the gained similarity matrix, and finds the connected components in the adjacency matrix to partition the data. Compared with the commonly used similarity matrix based spectral clustering methods, the proposed method does not need to compute the eigenvectors, which greatly reduces the time consuming. Moreover, the new method has no parameter when the similar-ity matrix is given. Experimental results show that the proposed algorithm can be successfully applied in image segmentation and the results are very encouraging.(5) In order to reduce the effect of the noise on both image segmentation and other subsequent image analysis, we propose a sparse representation-based denoising algorithm for mixed noise removal. The new algorithm effectively combines a median-type filter with a dictionary learned method and optimizes the proposed l1-l0model via a three-phases method. It uses double-sparsity to make a double-construction, leading to an enhanced restoration. Experimental results show that the new method makes a notable improvement for both impulse noise and Gaussian-impulse mixed noise removal tasks.
Keywords/Search Tags:Cluster analysis, Similarity measure, Intrinsic dimension, Semi-supervised clustering, Family resemblance, Image segmentation, Imagedenoising
PDF Full Text Request
Related items