Font Size: a A A

Publishing Algorithm In Location Data Based On Differential Privacy And Grid Clustering

Posted on:2020-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:D N YangFull Text:PDF
GTID:2428330602458026Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the continuous innovation of smart terminals,location-based applications can improve service quality by analyzing collected location data.However,location data contains sensitive personal information.Therefore,location data needs to be protected before being released to third-party organizations.Differential privacy technology does not rely on the attacker's background knowledge,can provide strict privacy guarantees,and is more suitable for data publication.Although existing differential privacy algorithms satisfy the privacy protection requirements,the published data are low availability due to excessive noise accumulation.In order to overcome these drawbacks,this paper proposes two improvements.For the dataset with small data volume and uniform data distribution,this paper proposes a threshold-based location data publishing algorithm.After the partitioning is finished,the algorithm randomly selects a grid cell and finds its adjacent grid cell.If variance of count values between adjacent grid cell and cluster is less than a given threshold,clustering operation is performed.Then noise is added to each cluster and evenly distributed to each grid cell within the cluster,thereby reducing the noise error caused by noise accumulation.At the same time,the selection range of the threshold is given according to the relationship between the noise error and the non-uniformity error.For the dataset with large data volume and poor uniformity,this paper proposes a location data publishing algorithm based on squared error.After the partitioning is finished,the algorithm first adds noise to each grid cell,and then performs grid clustering based on the real count value.During the clustering process,each time a new grid cell is added to the current cluster,noise is added to the cluster and the noise result is evenly divided.and then clusters according to the real count value.The squared sum error of the directly added noise and the real count value and the square sum error of the added noise and the true count value after clustering are respectively calculated,and selects the way of adding noise which square sum error is smaller.While solving the noise accumulation problem,the running time of the algorithm is further reduced.In this paper,the two improved algorithms are compared with other similar algorithms on the real dataset.The experimental results show that the algorithm proposed in this paper can reduce the query error and improve the accuracy of query results,thus improving the availability of data.
Keywords/Search Tags:Differential Privacy, Data Publication, Privacy Protection, Grid Clustering
PDF Full Text Request
Related items