Font Size: a A A

Differential Privacy Data Aggregation Optimizing Method And Its Application To Data Visualization

Posted on:2014-02-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:1228330398957637Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
There are a lot of private data in national basic and important information systemsinvolvin’g peoples livelihood such as medical treatment, ifnance, residence managementand etc, many of these sensitive dataset are large sample data, data visualizationtechnology can assist the user quickly and accurately explore the implicit characteristicsin these data, how to visualize the large sample data involving sensitive information isthe main research point in this paper.In this paper, we research on differential privacy which first proposed by Dwork in2006,it firstly defines a strict attacking model, preserves privacy by means of datadistortion through adding noise. There are two major advantages of differential privacy:①Privacy disclosure risk is independent of the attacke’rs background knowledge.②Thenoise adding amount doesn’t increase with the increasement of dataset. Since differentialprivacy can achieve a high level of privacy protection through addding a small amount ofnoise which has nothing to do with the dataset size, it is very suitable for solving theproblem of large scale sanple security visualization.In order to reduce the image overlay in data visualization and improve the largesample data visualization image quality, data is usually aggregated before visualization,the difficulty of differential privacy data aggregation lies on①Number of clusters indata aggregation is otfen big and the aggregation success ratio is very low because ofadding noise.②The uncertainty of iterations cause privacy budget excessiveconsumption^once privacy budget is used up, it’s dififcult to achieve effective privacyprotection. This paper is proposed in above context, we study differential privacy dataaggregation and its application on data visualization, including four points:1.Research on the theory and methods of achieving s-differential privacy, analyzethe interative and non-interactive mechanism under differential privacy, listing theadvantage of differential privacy comparing with众-anonymity and乙-diversity underseveral attacking models. Privacy budget consumption rate have a close relationship withthe sensitivity, we analysis the sensitivity bound under several specific cases.2.In the large sample data visualization, dataset is usually aggregated first to reduce the overlap and concide of visualization images in orer to improve visualization imagequality. Clustering is the basic means of data aggregation, in this paper, we research ondifferential privacy clusering, to improve the poor clustering availability of differentialprivacy介-means,through improving the selection of initial centers, we present a newmethod of IDP众-means clustering and prove it satisfying e-differential privacy.Conparing with the existing privacy preserving A:-means clustering, IDP々-meansachieves a better clustering availability in the same privacy level. In order to be moreobjectivity, we introduc similar privacy preserving clustering algorithm to compare withIDP众-means, our experiments show that IDP^:-means perform a better clustering qualityon the indicators, the advantage is more obvious on the large sarrple dataset.3.Thedifference between clustering and aggregation lies on aggregation algorithmneeding to support a greater number of clustered. Many clustering algorithms have theproblems of low clustering availability, large number of iterations, low algorithmeiffciency in the case of a larger number of clustered. Motivated by this, we propose adata aggregation algorithm called equiparition/:-rneans++,improve the conventional众-means for the purpose of data visualization to aggregate more clusters elfeciently. Theaggregated data obtained by equiparition A’-mcans++preserve the mostly feature oforiginal data, the algorithm also improve the visualization image quality. Ourexperiments show that at each value of DAL, equiparition々-means++get a good result innot only visualization image quality but also quality metrics of HDM and NNM.4.Differentialprivacy data aggregation method is one of the main research contentsin this paper, we propose Differential Privacy Equipartition众-means (DPE众-means)algorithm, which not only preserving privacy but also solve the problem of serious imageoverlap and low image quality during large sample data visualization, the aggregateddata keep the distribution, association, cluster of the original data set well. Comparingwith IDP众-means,at the same e, DPE々-means can support a larger maximum DAL,which means more clusters. The aggregated points have a more uniform distribution onthe original dataset, the related aggregation data quality metrics are improved. Runningtime of the algorithm is reduced at least halved than the traditional/:-means aggregation.
Keywords/Search Tags:Differential Privacy Preserving, Clustering, Equipartition k-means, DataAggregation, Data Visualization, Quality Metrics
PDF Full Text Request
Related items