Font Size: a A A

The Study Of Modified Affinity Propagation Clustering And It's Application

Posted on:2018-05-15Degree:MasterType:Thesis
Country:ChinaCandidate:D TangFull Text:PDF
GTID:2348330512478575Subject:Statistics
Abstract/Summary:PDF Full Text Request
Cluster analysis is a key branch of multivariate statistical analysis,which has been widely used in various fields of social life.Affinity Propagation Clustering Algorithm is a new-kind unsupervised Clustering Algorithm.It was put forward by Frey and Dueck in 2007.In this algorithm,the initial clustering center and numbers are not needed in advance.When the similarity matrix and Preference was constructed,the algorithm will automatically gain appropriate clustering points through message passing system.Preliminary studies show that the algorithm has many advantages,such as a fast calculation speed,a low margin of error squares sums,a high clustering accuracy.However,there are also some disadvantages.First of all,the AP algorithm chooses negative Euclidean distance as the similar-ity measure,but the Euclidean distance is only available in independent samples that is susceptible to dimension and it equally treat the importance to distance by each attribute.This paper presents the weighted Mahalanobis distance based on the mean square error,and taking the negative of this distance as the similarity measure of AP algorithm.Mahalanobis distance can adaptively adjust the distribution of data.The weighted Mahalanobis distance based on mean square error takes the attributes' influ-ences into consideration.It not only expands the scope of application of the algorithm,but also improves the accuracy of the clustering results.Secondly,the AP algorithm sets P into the same value,which admits the same possibility of the data points becoming class represent but ignores the influences on point becoming class represents brought by data distribution characteristics.To solve this defect,this paper proposed the establishment of P based on the membership sum of all the other points to to one point and the greater sum is more likely to become the class represent.Setting P value according to the data distribution means attaching P higher value to the points that have much possibility of becoming class represent,which can reduce the running times and time.Finally,in order to gain the clusters from 1 to k,an adaptive step size,dynamically adjusting P value and Gap index estimating optional clustering numbers are put for-ward.
Keywords/Search Tags:Affinity Propagation Clustering, weighted Mahalanobis distance, mem-bership, Preference, Gap statistic
PDF Full Text Request
Related items