Research On Key Technologies Of Data Analysis Satisfying Local Differential Privacy

Posted on:2023-02-28

Degree:Master

Type:Thesis

Country:China

Candidate:X T Bai

Full Text:PDF

GTID:2568306614989099

Subject:Cyberspace security

Abstract/Summary:

PDF Full Text Request

With the continuous development of information technology in the era of big data,the privacy of personal information including sensitive data has also attracted more and more attention.How to conduct data release and data analysis on the premise of protecting personal sensitive information from being leaked is a major challenge currently facing.As a new privacy protection model,local differential privacy has strong privacy protection.Compared with centralized differential privacy,it does not rely on any trusted third party and protects the user’s sensitive information locally.It not only protects against attackers with any background knowledge,but also protects against privacy attacks by any untrusted third parties.Under local differential privacy,we first study the one-dimensional density estimation problem,that is,how to reconstruct the distribution over a numerical domain.Existing solutions do not consider prior knowledge about the distribution of the overall data when performing distribution estimation,resulting in low data utility.We propose an incremental learning method based on prior knowledge to reconstruct the distribution.This method includes multiple rounds of collection.The main idea of this method is,in the first round,we first collect some users’ data under the assumption of no prior knowledge and then use the obtained statistical information to estimate the distribution of the dataset.From the second round,we consider using distribution information obtained in previous rounds as the prior knowledge to estimate the distribution of the dataset further accurately.We design two data mapping algorithms to transform the data to be collected by combining prior knowledge,and design a distribution aggregation algorithm to integrate the results of multiple rounds of collection.Experiments show that compared with existing methods,our scheme can significantly improve the accuracy of distribution estimation under the premise of providing users with privacy protection.In addition,we study the heavy hitter identification problem over a discrete domain,which aims to find more frequent items.We analyze the existing solving algorithms and propose a priority queue method to improve it.The main idea of this method is to gradually identify the frequent data prefixes from the fixed-length data by introducing the data structure of the priority queue,and finally identify the frequent items of the complete data.We verify the effectiveness of the scheme through experiments,and the results show that the improved scheme has higher data utility.

Keywords/Search Tags:

local differential privacy, distribution estimation, heavy hitter identification, data publish

PDF Full Text Request

Related items

1	An Algorithm For Detecting Heavy Hitters Based On Light Flow Eviction
2	Research On Statistical Estimation Methods For High-dimensional Data Based On Local Differential Privac
3	Statistical Data Analyses With Local Differential Privacy
4	Research On Heavy-Hitter Detection And Anomaly Detection In High-Speed Networks
5	Statistical Data Analyses Based On Local Differential Privacy
6	Research On Privacy-preserving Of Check-in Location Data Based On Local Differential Privacy
7	Research On Data Perturbation Technology Based On Local Differential Privacy
8	Research On Privacy-Preserving Truth Discovery Methods Based On Distribution Estimation
9	Research On Personalized Local Differential Privacy Scheme For Multidimensional Data
10	High-dimensional Data Publishing Algorithms Based On Local Differential Priacy