Font Size: a A A

Research On Affinity Propagation Clustering Algorithm

Posted on:2012-01-27Degree:MasterType:Thesis
Country:ChinaCandidate:X L FengFull Text:PDF
GTID:2218330371962535Subject:Military communications science
Abstract/Summary:PDF Full Text Request
Affinity Propagation (AP) algorithm is becoming increasingly popular in recent years as an efficient and fast clustering algorithm. Comparing with other traditional clustering algorithms, AP can be completed in a shorter time on clustering large-scale and multi-class data sets. Because of no requirement for the symmetry of the similarity matrix, it can be a good solution to non-Euclidean space. Therefore, AP has attracted considerable attentions and continuous discussions since its publication.However, there are two important shortcomings of AP. One is that AP is only suitable for the cluster on compact Hyperspherical structure datasets but has poor performance on the loose and complex structure datasets; the other one is that AP is an unsupervised learning method and cannot applicable directly to the clustering under constraints in practice. To solve these problems, this paper carries out an exploratory research on clustering technology.Several approaches are presented to improve the efficiency of AP by adjusting the similarity matrix. Kernel Shared Affinity Propagation (KSAP) algorithm is proposed to solve the asymmetrical density dataset and some unseparated linear data clustering. Futherly, there are two algorithms, Affinity Propagation Clustering based on Manifold Distance (APMD) and APMD based on Multi-Kernel Modeling (MKAPMD), which are designed for arbitrary shape and structure datasets clustering. In the end, with the consideration of constraints, Semi-Supervised Affinity Propagation Clustering with Pairwise Constraints (SAP-PC) algorithm is put forward.The innovations of this work include:1. Kernel Shared Affinity Propagation algorithm is proposed. Using the characteristic of data shared by nearest neighbors that is insensitive with the local density, a new similarity, shared nearest neighbors based on Gaussian kernel (KSNN), is designed to overcome the shortcoming of traditional similarity based on Euclidean distance that is only suit for clustering compact datasets. On the basis, KSAP is proposed. The simulation results show that KSAP can handle asymmetrical density dataset and some unseparated linear data clustering.2. Affinity Propagation Clustering based on Manifold Distance is proposed. In a scenario of local-consistent and global-consistent hypothesis, a locality preserving manifold distance is designed and APMD algorithm is proposed, to improve AP method that can only be used for clustering Hyperspherical datasets. APMD overcomes the defect of the original algorithm that can't clusters non-convex structure datasets. Futherly, a fast APMD algorithm is given to greatly reduce the computational complexity as well as maintain the performance of APMD. The simulation results show that the proposed algorithms settle the problem of clustering non-convex datasets effectively.3. Multi-kernel modeling by APMD (MKAPMD) is offered. To eliminate the flaw of APMD algorithm with single-kernel having poor performance on mixture model clustering, a multi-kernel model based on the manifold of datasets is established. And according to the manifold boundary, the parameters of kernel functions are adaptively optimized to fit the structure of datasets well. On this basis, The MKAPMD algorithm is proposed, which can settle the problem of clustering a variety of mixing manifold datasets and broaden the application scope of AP. The simulation results show the validity of this algorithm.4. Semi-Supervised Affinity Propagation clustering with pairwise constraints is proposed. Based on the manifold distance matrix, the Pairwise constraints were spread in the feature space by their propagation characteristics, and described by modifying the distance matrix of datasets; consequently, the application limit of AP to the clustering with constraints in practice is resolved. In addition, opening operation and closing operation was carried on the pairwise constraints to find the closure centers, which are used to replace the pairwise constraints. The proposed algorithm on this basis is SAP-PC that effectively solves the problem of constraint violation and improves the performance of AP.
Keywords/Search Tags:Clustering Analysis, Affinity Propagation, Similarity Measure, Kernel Technique, Shared Nearest Neighbor, Manifold Distance, Mixture Model, Constraints Clustering
PDF Full Text Request
Related items