Font Size: a A A

Research On Publishing Histograms Under Data Differential Privacy

Posted on:2017-11-12Degree:MasterType:Thesis
Country:ChinaCandidate:B ShaoFull Text:PDF
GTID:2348330518970770Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Differential privacy is a new privacy preserving model, and histograms are important form of data releasing under differential privacy. The traditional methods of histogram publishing based on differential privacy preserving are data conversion and data compression,which are to reconstruct the original histogram and disturbing the true bins’ frequencies of histogram by adding proper noise, in order to implement the privacy preserving. And it is important to balance the reconstruction error and the noise error during that process. Although many strategies have been put forward to balance those errors, none of them considers the impact of the outliers existed in the bins’ frequencies of original histograms on the reconstruction error.Hence, the main goal of this paper is to study the outliers existed in the bins’ frequencies of original histograms, to analyze the impacts of outliers and alternative distribution degree on the reconstruction, then to put forward a method of histogram publishing with outliers under differential privacy. The main content of this paper is as follows:Firstly, the outliers and alternative distribution degree are defined systematically, and the impacts of them on the reconstruction are also analyzed in detail.Secondly, decreasing the alternative distribution degree algorithm (De-ADD) is proposed according to the impacts of outliers and alternative distribution degree on the reconstruction.This algorithm preprocesses the frequency sequence of the original histogram’s bins by comparing the adjacent bins’ frequencies with noise, which satisfies the requirement of differential privacy and reduces the alternative distribution degree. Also the impacts of outliers and alternative distribution degree on the reconstruction is reduced.Thirdly, the so-called Merge-Bins algorithm which is based on the strategy of merger bins is proposed. The main idea of this algorithm is to adopt the greedy thought to reconstruct the histogram, then to merge the two adjacent bins with the most similar frequency each time according to the exponential mechanism, and stopping the reconstruction until reaching the minimum error. Finally, we add Laplace noise to each bin’s frequency of histogram to implement the differential privacy preserving.Finally, simulation based on real datasets is used to evaluate the effect of the proposed method. According to De-ADD algorithm proposed in this paper, some original histograms are preprocessed to reduce the alternative distribution degree, then they are compared with other histograms with no preprocessing to verify the availability of that algorithm. At last,the differences between histograms with outliers publishing method under differential privacy(Outlier-HistoPub) and those existing methods of histogram publishing are obtained, in order to verify the accuracy and effectiveness of Outlier-HistoPub method.
Keywords/Search Tags:DP, Histogram, Outlier, ADD, Noise
PDF Full Text Request
Related items