Font Size: a A A

Research On Optimization Of Density Peaks Clustering Algorithm And Its Privacy Preserving

Posted on:2020-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:S T BaoFull Text:PDF
GTID:2428330575962406Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Data mining can discover valuable patterns and knowledge hidden in large amounts of data,and clustering analysis is an important research method in the field of data mining.As a kind of data analysis method for unsupervised learning,clustering analysis aims to divide samples into clusters according to the similarity between samples,such that the samples within the cluster have a high degree of similarity and that the samples belonging to different clusters have a low degree of similarity.It has been widely used in many fields such as pattern recognition,image processing and community detecting.The process of clustering analysis is to mine and reuse data.If the data containing sensitive information is maliciously exploited by attackers,it will have a huge impact on the personal property and reputation of users.Density peaks clustering is a kind of density-based clustering algorithm,which can discover clusters of arbitrary shapes without iteration.The clustering process is relatively simple and efficient.However,density peaks clustering algorithm also has some shortcomings.It is sensitive to global parameter d_c.The allocation process of remaining samples can cause a“Domino Effect”,which lead to the incorrect allocation of samples.It may reveal user privacy when calculating local density and shortest distance of samples.To solve the above problems,the main work and innovation points are as follows:(1)Aiming at the defect that density peaks clustering algorithm is sensitive to global parameter d_c,a density peaks clustering algorithm based on shared near neighbors similarity was proposed.Firstly,Euclidean distance and shared near neighbors similarity were combined to calculate the local density of each sample which avoided the setting of the parameter d_c.Secondly,the selection process of cluster centers was optimized to select the initial cluster centers adaptively.Finally,each sample was assigned to the cluster as its nearest neighbor with higher density.Experimental results on both UCI and synthetic datasets show that the proposed algorithm can effectively improve the accuracy of clustering method and the quality of clustering results.(2)Aiming at the defect that density peaks clustering algorithm only relies on local density to assign the remaining samples which cause"domino effect",a density peaks clustering based on gravitational search algorithm was presented.Based on the optimized cluster centers selected by density peaks clustering algorithm with density measure,cluster centers were used as the initial agents.The best clustering distribution was achieved according to the distance criterion of gravitational search algorithm.Experimental results on both UCI and synthetic datasets demonstrate that the proposed algorithm has better clustering effect.(3)Aiming at the defect that density peaks clustering algorithm may reveal user privacy when calculating local density and shortest distance of samples,differential privacy preserving density peaks clustering algorithm was proposed.In order to protect privacy security,Laplace noise was added to the calculation of local density and shortest distance.At the same time,the privacy security analysis proved that our method satisfied differential privacy protection.Experimental results on both UCI and synthetic datasets indicate the proposed algorithm prevents the leakage of privacy and avoids reducing the availability of data.
Keywords/Search Tags:density peaks clustering, shared near neighbors similarity, gravitational search algorithm, differential privacy
PDF Full Text Request
Related items