Statistical Data Analyses Based On Local Differential Privacy

Posted on:2022-01-26

Degree:Master

Type:Thesis

Country:China

Candidate:H Fu

Full Text:PDF

GTID:2518306323478724

Subject:Information security

Abstract/Summary:

PDF Full Text Request

With the prevalence of smart devices and the penetration of online services,varied and fine-grained personal data are being collected and analyzed by service providers,in order to optimize service strategy and improve service quality.However,directly col-lecting data from users comes with privacy concerns since personal data may reveal sen-sitive information about individuals.For example,a user's location data might disclose his family address and daily activity;a user's web browsing log could reveal his identity and preference.Local differential privacy(LDP)is an emerging privacy-preserving data aggregation framework without any reliance on trusted data curators or third parties.It provides provable privacy protection locally independent of the adversary's background knowledge and computational power,meanwhile enjoys low computational and com-munication costs,hence is applicable to large-scale data collection and analysis task.Within the framework of LDP,this dissertation studies privacy-preserving data collec-tion and analysis of key-value data and set-valued data,which are useful for representing user-generated data in various online services.Specifically,this dissertation makes the following contributions.�For the privacy preservation of key-value data,this dissertation proposes the PHR mechanism.It provides rigorous data privacy protection(i.e.,satisfying LDP)on the user-side and allows effective statistical analyses(i.e.,frequency estimation for keys and mean estimation for values)on the server-side.Existing approaches follow a sampling-then-perturbing paradigm and thus suffer from low data util-ity and large estimation error.In PHR mechanism,we propose to pad empty key-value pairs to preserve the information of missing keys and adopt hashing technique to compact more information to user-uploaded privatized data during randomization.Theoretical analysis and experimental results demonstrate our method provides better utility under the same LDP guarantees than state-of-the-art mechanisms and achieves an average 50%error reduction for both frequency and mean estimations.�For the privacy preservation of set-valued data,under local d-privacy constraints,which capture intrinsic dissimilarity between set-valued data in the framework of differential privacy,this dissertation proposes the PrivFIM mechanism.It pro-vides rigorous data privacy protection(i.e.,satisfying local d-privacy)on the user-side and allows effective statistical analyses(i.e.,itemset frequency estimation and frequent itemset mining)on the server-side.Specifically,each user perturbs his set-valued data locally to guarantee that server cannot infer the user's original itemset with high confidence.The server can reconstruct an unbiased estimation of itemset frequency from these randomized data and then combines it with the Apriori-based pruning technique to identify frequent itemsets efficiently and ac-curately.Extensive experiments conducted on real-world and synthetic datasets demonstrate an average 30%error reduction compared with existing mechanisms.

Keywords/Search Tags:

Data Privacy, Differential Privacy, Key-value Data, Set-valued Data, Frequency Estimation, Mean Estimation, Frequent Itemset Mining

PDF Full Text Request

Related items

1	Research On Frequency Estimation And Frequent Itemset Mining For Local Differential Privacy Protection
2	Research On Frequent Itemset Mining Based On Differentially Private Model
3	Research On Frequent Itemset Mining Of Complex Data Based On Local Differential Privacy
4	Study On The Frequent Itemset Mining Based On Differential Privacy
5	Statistical Data Analyses With Local Differential Privacy
6	Research On Frequent Itemset Mining Method With Differential Privacy Based On Transaction Truncation
7	Research On Privacy-preserving Of Check-in Location Data Based On Local Differential Privacy
8	A Study Of Efficiently Privacy Preserving Data Publishing Of Set-valued Data
9	Research Of Frequent Itemsets Mining Algorithm With Differential Privacy For Large-scale Data
10	Frequent Itemsets Mining For Uncertain Data Based On Differential Privacy