Font Size: a A A

Research On Privacy-preserving Clustering Based On Differential Privacy

Posted on:2022-07-15Degree:MasterType:Thesis
Country:ChinaCandidate:C SuFull Text:PDF
GTID:2518306740495004Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Data mining can find potential rule patterns in data and provide support for decision making.Clustering is an important basic function of data mining.The access to business data in the process of clustering inevitably leads to privacy leakage.As people pay more and more attention to personal privacy,how to realize clustering while protecting data privacy has become an urgent problem.Differential privacy is an effective technology to realize privacy protection,which has attracted continuous attention from researchers in recent years.Aiming at the shortcomings of the existing clustering methods based on differential privacy in terms of privacy security and clustering quality,the distance matrix perturbation method based on differential privacy,KQ-Means clustering method based on perturbation matrix and MCP-DBSCAN clustering method were proposed to improve the accuracy of clustering results and achieve the balance of data privacy security and clustering availability.The main work of the paper is as follows:(1)According to the existing k-means clustering method based on differential privacy,the computation process of clustering center points is only disturbed with noise,and the privacy leakage risk of non-central points in the clustering process is not considered.The LDM method is proposed to add noise to the distance between data points of the original data set.By extracting the distance between points from the original data set and adding noise satisfying the differential constraint,the noise matrix Dist M is constructed to realize the privacy protection of the distance between points.Provide the noise matrix to untrusted parties,further design KQ-Means clustering method based on noise matrix,introduced a concept of k nearest neighbor,design and improve the clustering partition method,the data record assigned from its recent q center within the expected range,reduce several rounds of iteration difference clustering errors caused by noise accumulation,improve the clustering effect.(2)In view of the existing privacy protection DBSCAN clustering method based on the difference of privacy,the neighborhood radius parameters eps setting difficult,affect the privacy clustering quality,and clustering taken individually to clustering on emphasis and merging method,lead to differential noise seriously affect the accuracy of clustering problems,put forward MCP-DBSCAN clustering method based on the difference of privacy.By calculating the distance between points and combining with the ascending curve of k distance,the appropriate neighborhood radius parameter eps was selected to improve the clustering accuracy.Based on distance matrix column sorting after add noise constructing Asc matrix,the first Minpts Asc matrix row vector comparison with neighborhood radius parameter,in turn,determine emphasis,and select from more emphasis to the farthest distance principle sample points cluster belong to determine,under the premise that both privacy and privacy protection density clustering efficiency and precision.Theoretical analysis and experimental results show that the proposed method can effectively maintain the accuracy of clustering results while avoiding privacy leakage of data sets in the clustering process.
Keywords/Search Tags:Clustering Analysis, Privacy-preserving, Differencial Privacy, k-means Clustering, DBSCAN Clustering
PDF Full Text Request
Related items