Font Size: a A A

Differentially Private K-means Clustering

Posted on:2018-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:T Y LiuFull Text:PDF
GTID:2348330542987337Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the arrival of large data age,data mining technology applied to various fields.People enjoy the convenience of data mining services at the same time,the problem of data privacy disclosure more and more prominent.As an emerging privacy protection technology,differential privacy has received much attention because of its strict verifiability,and has been extensively studied.K-means clustering algorithm is one of the core technologies in data mining.K-means clustering algorithm has the risk of privacy disclosure.At present,there are two deficiencies in the method of differential privacy protection K-means clustering.One is the random selection of the initial center point method,reducing the stability of the clustering effect;the other is the privacy budget allocation scheme lacks strict theoretical proof.In this paper,a DPC(Differentially Picked Center)method is proposed to select a more reasonable initial center point for the K-means clustering algorithm.In addition,a DSNDN(Different SUM and NUM Different Noise)method is proposed to optimize the difference A Privacy Budget Allocation Scheme in Privacy-preserving K-means Clustering.The main contents of this paper are as follows:(1)Aiming at the disadvantage that the clustering effect of K-means clustering is susceptible to the initial center point,a method is proposed to select the initial center of K-means clustering.Using the exponential mechanism in differential privacy and density estimation,we can provide a more reasonable initial center point for K-means clustering algorithm under the premise of ensuring data privacy,and improve the stability of clustering effect.(2)Based on the error analysis,the DSNDN method is proposed to minimize the mean square error MSE of the center of random noise,and the privacy budget of the cluster center is allocated rationally.And the privacy budget allocation scheme of DSNDN algorithm in the process of protecting K-means clustering is given in detail through the application of the properties of differential privacy combination.(3)Using the UCI(University of California Irvine)data set for comparison experiments,the DPC method and the DSNDN method are proved to be useful and effective.
Keywords/Search Tags:Differential Privacy, Cluster analysis, K-means Cluster, Initial center, Privacy Budget
PDF Full Text Request
Related items