Font Size: a A A

Research On Differential Privacy Protection Clustering Methods For Laplacian Mechanism

Posted on:2021-06-10Degree:MasterType:Thesis
Country:ChinaCandidate:G H ChuFull Text:PDF
GTID:2518306032967769Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the artificial intelligence industry,data mining technology has been widely used in society as an important branch of artificial intelligence.Cluster analysis plays an important role in various industries as an important algorithm in the field of data mining.However,the use of clustering algorithms while mining the potential association of data may also lead to the disclosure of user privacy.Therefore,it is of great signficance to protect private information when using clustering algorithms for data mining.As a privacy protection method,differential privacy defines an extremely strict attack model,which is applied to the cluster analysis process for privacy protection.Data privacy is protected by adding Laplacian noise to data to ensure data availability.To solve the problem of low data availability and insufficient privacy based on the clustering algorithm based on differential privacy protection,this paper does the following work:(1)Aiming at the problems of blind random selection of initial center point and sensitivity to outliers in k-means clustering algorithm,a clustering method based on outlier detection and initial center point selection optimization(OPT k-means)is proposed.This method uses the box-type isolated forest algorithm(IFAB)proposed in this paper when detecting abnormal points,and uses the idea of the farthest centroid distance when selecting the initial center point,combining the decision distance and decision set to propose an initial center point selection algorithm(IPS).In the clustering process,the outlier interference is reduced,and the initial center point is distributed in different clusters as close as possible to the cluster center.Experiments show that this method has a greater time advantage while improving the clustering effect.(2)Aiming at the privacy leakage in the process of cluster analysis,in order to achieve differential privacy protection,an OPT k-means algorithm(OPTDP k-means)based on differential privacy protection is proposed.This algorithm uses the initial center point noise algorithm(DP-IPS)proposed in this paper when adding noise to the initial center point,Adding Laplacian noise to the sample points during the implementation of the algorithm reduces the risk of privacy leakage during cluster analysis and ensures privacy security.Experiments show that,under the premise of protecting privacy from being leaked,the algorithm can better ensure the availability of data and improve the efficiency of the algorithm.(3)Aiming at the insufficient security of the traditional DP-DBSCAN algorithm,an improved DP-DBSCAN algorithm(IDP-DBSCAN)based on the noise adding method is proposed.This algorithm improves the traditional DP-DBSCAN algorithm to add noise,and adds Laplace noise to the core object,which reduces the risk of privacy leakage to a greater extent.Experiments show that the algorithm has a high degree of security while preserving the data availability of the traditional DP-DBSCAN algorithm to a large extent.(4)Aiming at many problems such as the large amount of calculation and the subjectivity of abnormal point detection when the traditional CURE algorithm selects representative points,an improved CURE algorithm based on decision distance is proposed(I-CURE).The algorithm uses the decision distance and decision set to detect abnormal points and select representative points,and uses the abnormal point redistribution algorithm to re-divide the abnormal points(OLRB).In order to solve the privacy leakage problem of CURE clustering algorithm and I-CURE clustering algorithm in the cluster analysis process,the CURE algorithm based on differential privacy protection(DP-CURE)and the I-CURE algorithm based on differential privacy protection(DP-I-CURE).Experiments show that the I-CURE algorithm has better results in terms of time efficiency and clustering accuracy.The DP-CURE algorithm and the DP-I-CURE algorithm ensure the availability of data while satisfying security.
Keywords/Search Tags:differential privacy, Laplacian noise, k-means, DBSCAN, CURE
PDF Full Text Request
Related items