Font Size: a A A

Differential Privacy Multi-dimensional Data Partitioning Publication

Posted on:2017-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:Q LuFull Text:PDF
GTID:2348330512475268Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of information technology,data collect and release are more and more common,and the value of the data is highlighted.At the same time,data privacy and security issues have become an increasing concern.To this end,researchers have proposed many effective privacy protection models,in which the differential model with its characteristic strict protection of privacy caused widespread concern in many domestic and foreign scholars.In recent years,many studies have been conducted on differential privacy.The study of differential privacy data distribution method focused on the one-dimensional histogram data,few studies of multi-dimensional data.At present,the practical application generated a lot of multi-dimensional data,and how to maximize the data availability under the premise of ensuring the privacy and security of the data,it has become a hot topic for researchers.This thesis mainly focuses on the research of the differential privacy multi-dimensional data publication,trying to design a high efficiency data publishing algorithm with data privacy security reliability and publish data availability.The main work includes:(1)In order to boost the accuracy of range counting queries on the released differential privacy two-dimensional space data,an algorithm Quad-heu based on quad-tree for differential privacy two-dimensional data publication by space partitioning is proposed.The basic idea of Quad-heu is to firstly construct a quad-tree with respect to the two-dimensional data and then add Laplace noise to tree nodes.After that,a bottom up heuristic approach for structural adjustment of the quad-tree is put forward,the aim of which is to balance the noise error of queries and the error of uniform hypothesis.Finally,the accuracy of range counting queries is further reduced by post-processing the tree nodes,values through the consistency constraint of queries.Experimental analysis is designed by comparing Quad-heu and the traditional algorithms on the accuracy of range counting queries in the released data and the algorithm efficiency.Experimental results show that Quad-heu is effective and feasible.(2)Aiming at the shortage of quad-tree can only deal with two-dimensional data,a differential privacy multi-dimensional data publication algorithm based on multi-dimensional interval tree is proposed.The algorithm constructs a multi-dimensional interval tree with respect to the multi-dimensional data.In the process of building tree,we use the heuristic method to adjust the structure of the interval tree,on the one hand to balance the query noise errors and uniformity hypothesis error,on the other hand can greatly reduce the space consumption.The experimental results show that the proposed algorithm is effective and feasible by comparing the algorithm with the same kind of algorithm in the 2D and 3D data.(3)In the case of the large space consumption of multi-dimensional interval tree,the privacy budget assigned to nodes is less and the accuracy of interval counter query is poor,a new algorithm based on multi-level grid partition is proposed.The algorithm partitions the multi-dimensional'space to multi-level grid.For each grid node,choose whether to further partition based on its data distribution.At the same time,the algorithm uses kd-tree to merge nodes in the same level.The experimental results show that the algorithm is effective and feasible in the multi-dimensional data.
Keywords/Search Tags:Differential Privacy, Multi-dimensional Data, Space Partitioning, Quadtree, Multi-dimensional Interval Tree, Kd-tree
PDF Full Text Request
Related items