Font Size: a A A

A Power Iterative Clustering Method Based On Differential Privacy

Posted on:2019-07-06Degree:MasterType:Thesis
Country:ChinaCandidate:M ZhaoFull Text:PDF
GTID:2428330548994973Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,as an unsupervised data mining method,clustering can extract information from data and use features as the basic attributes of clustering to realize data mining.Combining with the methods and theories of linear algebra,many feature extraction techniques for information data have emerged,further improving the quality of feature extraction and the clustering accuracy.In this paper,we use a simple and fast power method to obtain the characteristics of datasets,and use the iterated eigenvectors to cluster samples,that is,power iterative clustering.However,the data set often contains some private data or sensitive information.It is difficult to avoid the privacy data and eliminate the privacy of the data after the data is distributed.The differential privacy preserving model does not need to face the possible background knowledge attack.According to the intensity of any background knowledge,the privacy preserving level is quantified by parameters to ensure the security of private information and achieve the clustering effect.The specific perturbation method of differential privacy needs to combine the corresponding clustering algorithm to ensure the privacy of the data set under the premise of maximizing the data set availability.In this paper,we propose a power-iterative clustering algorithm based on differential privacy.In order to solve the problem of feature vector in the iterative process and the privacy leakage existing in the center of the feature clustering,the differential privacy preserving model is fused to two levels of the poly-power iterative algorithm respectively.However,traditional differential privacy techniques use data perturbations to reduce the clustering quality of the algorithm and easily change the convergence direction of the eigenvectors.Therefore,the noise function satisfying the Laplacian distribution is added to the attribute values of the eigenvectors in the iteration process in turn,and the reasonable privacy budget ? is experimentally set up to solve the defect of the traditional fusion differential privacy technique.Finally,by using the property of sequence combination of differential privacy,the power-iterative clustering algorithm based on differential privacy is proved to satisfy ?-difference privacy,and the algorithm process and implementation code are given.The experiment tests the best clustering effect of the privacy budget by the variable ?,while the experimentally different power-iterative clustering algorithm and its clustering results under the condition of differential privacy are compared.However,the differential privacy technique affects the clustering effect under certain conditions,but the availability of experimental data set results is still high.In addition,the quality of clustering algorithm under different parameters is tested and compared with the existing differential privacy clustering algorithm,which shows obvious advantages under the test of large data set.
Keywords/Search Tags:Data mining, Differential privacy, Power iterative clustering, Data perturbation
PDF Full Text Request
Related items