Font Size: a A A

Research On The Privacy Protection Algorithm Of Clustering Based On Difference Privacy

Posted on:2019-06-14Degree:MasterType:Thesis
Country:ChinaCandidate:M Q WeiFull Text:PDF
GTID:2518306512956299Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In the current era of information sharing,privacy protection issues in data mining and data publishing have been the focus of in-depth research in the field of information security.Among them,the anonymization technology is one of the main privacy protection technologies at present,which can effectively reduce the risk of leakage of the user's privacy information,and at the same time ensure that the data has a certain degree of authenticity and availability.In addition,differential privacy protection technology is also one of the research hotspots of privacy protection technology.It has strict mathematical proofs,can overcome the background knowledge assumptions required by traditional privacy protection models and can not quantitatively analyze the shortcomings of privacy protection.However,the use of privacy protection technologies tends to reduce the availability of data.Therefore,how to balance the privacy protection and data availability is an issue that needs to be solved urgently.This dissertation focuses on the problem of poor availability of DCMDP(density-based clustering mechanism with differential privacy)algorithm and low efficiency of algorithm execution.Through in-depth analysis of the factors that restrict the availability of the algorithm and the low implementation efficiency,an improved algorithm is designed.The main work content and results are as follows:1)To solve the problem of over-generalization of data due to the anonymization of DCMDP algorithm,a K-density-based clustering mechanism with differential privacy(KDCMDP)algorithm was designed.The algorithm adopts the idea of micro-aggregation algorithm.The cluster obtained by clustering DBSCAN algorithm is further divided by k to obtain smaller similar equivalence classes.The number of equivalent class records is controlled between k and 2k-1,and the anonymization is realized.The optimal k-division in the process.And when performing DBSCAN clustering,a distance similarity matrix is constructed to store the distance between any two points in the data set and solve the problem that a large amount of time is consumed due to repartitioning.By calculating the number of records and the total value of each equivalence class,Laplacian noise is added to the number of records and the total value respectively,the centroid value of the equivalence class is updated,and the new centroid value is used instead of the value of other records in the equivalence class..Experimental results show that the KDCMDP algorithm has less information loss than the DCMDP algorithm and has higher data availability.1)Aiming at the low efficiency of KDCMDP algorithm due to clustering,KGDCMDP(k group density-based clustering mechanism with differential privacy)algorithm is designed.The algorithm uses a grouping algorithm to divide the data set into multiple groups,and then clusters the groups to achieve high cohesion of data in the cluster and low coupling of data between the clusters.Finally,K division for each group in each generated cluster.The KGDCMDP algorithm reduces the neighborhood query in the grouping and clustering process and divides each group to reduce the distance calculation.Experimental results show that KGDCMDP algorithm has higher efficiency than KDCMDP algorithm.
Keywords/Search Tags:privacy protection, Anonymous, Differential privacy, DCMDP, DBSCAN
PDF Full Text Request
Related items