Differential Privacy Data Aggregation Optimizing Method And Its Application To Data Visualization

Posted on:2014-02-24

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Y Li

Full Text:PDF

GTID:1228330398957637

Subject:Control theory and control engineering

Abstract/Summary:

PDF Full Text Request

There are a lot of private data in national basic and important information systemsinvolvinâ€™g peoples livelihood such as medical treatment, ifnance, residence managementand etc, many of these sensitive dataset are large sample data, data visualizationtechnology can assist the user quickly and accurately explore the implicit characteristicsin these data, how to visualize the large sample data involving sensitive information isthe main research point in this paper.In this paper, we research on differential privacy which first proposed by Dwork in2006ï¼Œit firstly defines a strict attacking model, preserves privacy by means of datadistortion through adding noise. There are two major advantages of differential privacy:â‘ Privacy disclosure risk is independent of the attackeâ€™rs background knowledge.â‘¡Thenoise adding amount doesnâ€™t increase with the increasement of dataset. Since differentialprivacy can achieve a high level of privacy protection through addding a small amount ofnoise which has nothing to do with the dataset size, it is very suitable for solving theproblem of large scale sanple security visualization.In order to reduce the image overlay in data visualization and improve the largesample data visualization image quality, data is usually aggregated before visualization,the difficulty of differential privacy data aggregation lies onâ‘ Number of clusters indata aggregation is otfen big and the aggregation success ratio is very low because ofadding noise.â‘¡The uncertainty of iterations cause privacy budget excessiveconsumption^once privacy budget is used up, itâ€™s dififcult to achieve effective privacyprotection. This paper is proposed in above context, we study differential privacy dataaggregation and its application on data visualization, including four points:1.Research on the theory and methods of achieving s-differential privacy, analyzethe interative and non-interactive mechanism under differential privacy, listing theadvantage of differential privacy comparing withä¼—-anonymity andä¹™-diversity underseveral attacking models. Privacy budget consumption rate have a close relationship withthe sensitivity, we analysis the sensitivity bound under several specific cases.2.In the large sample data visualization, dataset is usually aggregated first to reduce the overlap and concide of visualization images in orer to improve visualization imagequality. Clustering is the basic means of data aggregation, in this paper, we research ondifferential privacy clusering, to improve the poor clustering availability of differentialprivacyä»‹-meansï¼Œthrough improving the selection of initial centers, we present a newmethod of IDPä¼—-means clustering and prove it satisfying e-differential privacy.Conparing with the existing privacy preserving A:-means clustering, IDPã€…-meansachieves a better clustering availability in the same privacy level. In order to be moreobjectivity, we introduc similar privacy preserving clustering algorithm to compare withIDPä¼—-means, our experiments show that IDP^ï¼š-means perform a better clustering qualityon the indicators, the advantage is more obvious on the large sarrple dataset.3.Thedifference between clustering and aggregation lies on aggregation algorithmneeding to support a greater number of clustered. Many clustering algorithms have theproblems of low clustering availability, large number of iterations, low algorithmeiffciency in the case of a larger number of clustered. Motivated by this, we propose adata aggregation algorithm called equiparition/:-rneans++ï¼Œimprove the conventionalä¼—-means for the purpose of data visualization to aggregate more clusters elfeciently. Theaggregated data obtained by equiparition Aâ€™-mcans++preserve the mostly feature oforiginal data, the algorithm also improve the visualization image quality. Ourexperiments show that at each value of DAL, equiparitionã€…-means++get a good result innot only visualization image quality but also quality metrics of HDM and NNM.4.Differentialprivacy data aggregation method is one of the main research contentsin this paper, we propose Differential Privacy Equipartitionä¼—-means (DPEä¼—-means)algorithm, which not only preserving privacy but also solve the problem of serious imageoverlap and low image quality during large sample data visualization, the aggregateddata keep the distribution, association, cluster of the original data set well. Comparingwith IDPä¼—-meansï¼Œat the same e, DPEã€…-means can support a larger maximum DAL,which means more clusters. The aggregated points have a more uniform distribution onthe original dataset, the related aggregation data quality metrics are improved. Runningtime of the algorithm is reduced at least halved than the traditional/:-means aggregation.

Keywords/Search Tags:

Differential Privacy Preserving, Clustering, Equipartition k-means, DataAggregation, Data Visualization, Quality Metrics

PDF Full Text Request

Related items

1	Research On Clustering Algorithms In Differential Privacy
2	Research On Privacy-preserving Clustering Based On Differential Privacy
3	Research On Privacy Preserving K-means Clustering Algorithm
4	Research On K-means Clustering Algorithm Based On Differential Privacy
5	Research On K-means Clustering Algorithm Based On Differential Privacy Protection
6	Research And Design Of Privacy Preserving K-means Clustering Algorithm In The Cloud
7	Research On Improvement Of K-means Clustering Algorithm Based On Differential Privacy
8	Research On Key Technologies Of Privacy Preserving Data Mining Based On Local Differential Privacy
9	Research On K-means++ Clustering Algorithm Based On Laplace Mechanism For Differential Privacy Protection
10	Adaptive Differential Privacy And Its Applications