Font Size: a A A

The Research And Application Of The MEAP Clustering Algorithm

Posted on:2016-04-10Degree:MasterType:Thesis
Country:ChinaCandidate:L L ChenFull Text:PDF
GTID:2308330464465030Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of computer and communication technology, digital information obtained by people is increasing very fast. In order to extract valuable refined information, data mining technology has got more and more attention and research. As an important basic technology of data mining, cluster analysis has obtained rapid development. In 2007, Frey et al. proposed a brand-new center-based partition clustering algorithm, AP clustering algorithm. This algorithm takes all data points as potential exemplars at first and aims to maximize the sum of the similarity between all data points and their exemplars.This method automatically generates the exemplars through passing message between data points and competing with each other. This algorithm has been successfully applied to many areas for its advantages such as not affected by its initialization, no restriction on the similarity matrix, fast running speed, high cluster accuracy. But this algorithm still has some drawbacks which deserve deep research. AP algorithm is not suitable to handle multi-subclass data sets. To address this flaw, Wang extended this method to multi-exemplar affinity propagation(MEAP). MEAP algorithm is a center-based clustering method in nature too. When dealing with arbitrary shape data sets with manifold structure, MEAP algorithm cannot obtain good clustering result. To overcome this shortcoming, this paper proposes some improvements based on the idea of manifold. The innovations and work contents are as follows:Firstly, a brand new similarity measure based on the idea of manifold structure is presented. First of all, it changes the distance between all data points through the exponential transformation in order to amplify the distance of data points between different manifolds. And then, it uses DBSCAN to search the manifold structure inside the data sets, and it decreases the distance between data points of the same manifolds structure. Based on the above changes, taking the opposite value, we get a new similarity matrix, which can reflect the distributed structure inside the data sets. Combined this similarity matrix with MEAP, we have the manifold structure based multi-exemplar affinity propagation(MS-MEAP).The results on artificial datasets and USPS digits datasets show the effectiveness of this method when dealing with this kind of data sets.Secondly, this paper proposes a novel similarity measure based on neighbor propagation. This similarity can connect two points which are far away by an intermediate point through the neighbor propagation. As a result, it can magnify the similarity of data points belonging to the same manifold structure. On the other hand, through the exponential transformation the similarity between two data points in different manifold structure is reduced. As a consequence of the neighbor propagation, the similarity between data objects is optimized, thus reflecting the distribute structure and interrelation among data objects more precisely. The proposed algorithm NP-MEAP can achieve completely correct cluster results in the eight artificial dataset. The experiment results on USPS digits datasets also show NP-MEAP has better performance.Thirdly, this paper applies the NP-MEAP algorithm to the image clustering problem. First of all, for the face clustering problem, NP-MEAP algorithm improves the accurary of the face clustering problem by the combination of SIFT features and neighbor propagation. The SIFT algorithm extracts and matches the features between images very slowly. So a two level clustering method is proposed to solve this problem while it does not decrese the accrary much. After the face clustering, NP-MEAP is applied to cluster the Chinese traditional moire image. First the moire image is preprocessed to uniform the image size, remove the background noise and refine the line of the moire image. Using the neighbor propagation to optimize the similariy matrix extracted by the SC algorithm, the clustering results prove that NP-MEAP outperforms SIFT-MEAP and ED-MEAP.
Keywords/Search Tags:AP algorithm, MEAP algorithm, similarity matrix, manifold structure, DBSCAN algorithm, neighbor propagation, image clustering
PDF Full Text Request
Related items