Font Size: a A A

Statistical Data Analyses Based On Local Differential Privacy

Posted on:2022-01-26Degree:MasterType:Thesis
Country:ChinaCandidate:H FuFull Text:PDF
GTID:2518306323478724Subject:Information security
Abstract/Summary:PDF Full Text Request
With the prevalence of smart devices and the penetration of online services,varied and fine-grained personal data are being collected and analyzed by service providers,in order to optimize service strategy and improve service quality.However,directly col-lecting data from users comes with privacy concerns since personal data may reveal sen-sitive information about individuals.For example,a user's location data might disclose his family address and daily activity;a user's web browsing log could reveal his identity and preference.Local differential privacy(LDP)is an emerging privacy-preserving data aggregation framework without any reliance on trusted data curators or third parties.It provides provable privacy protection locally independent of the adversary's background knowledge and computational power,meanwhile enjoys low computational and com-munication costs,hence is applicable to large-scale data collection and analysis task.Within the framework of LDP,this dissertation studies privacy-preserving data collec-tion and analysis of key-value data and set-valued data,which are useful for representing user-generated data in various online services.Specifically,this dissertation makes the following contributions.·For the privacy preservation of key-value data,this dissertation proposes the PHR mechanism.It provides rigorous data privacy protection(i.e.,satisfying LDP)on the user-side and allows effective statistical analyses(i.e.,frequency estimation for keys and mean estimation for values)on the server-side.Existing approaches follow a sampling-then-perturbing paradigm and thus suffer from low data util-ity and large estimation error.In PHR mechanism,we propose to pad empty key-value pairs to preserve the information of missing keys and adopt hashing technique to compact more information to user-uploaded privatized data during randomization.Theoretical analysis and experimental results demonstrate our method provides better utility under the same LDP guarantees than state-of-the-art mechanisms and achieves an average 50%error reduction for both frequency and mean estimations.·For the privacy preservation of set-valued data,under local d-privacy constraints,which capture intrinsic dissimilarity between set-valued data in the framework of differential privacy,this dissertation proposes the PrivFIM mechanism.It pro-vides rigorous data privacy protection(i.e.,satisfying local d-privacy)on the user-side and allows effective statistical analyses(i.e.,itemset frequency estimation and frequent itemset mining)on the server-side.Specifically,each user perturbs his set-valued data locally to guarantee that server cannot infer the user's original itemset with high confidence.The server can reconstruct an unbiased estimation of itemset frequency from these randomized data and then combines it with the Apriori-based pruning technique to identify frequent itemsets efficiently and ac-curately.Extensive experiments conducted on real-world and synthetic datasets demonstrate an average 30%error reduction compared with existing mechanisms.
Keywords/Search Tags:Data Privacy, Differential Privacy, Key-value Data, Set-valued Data, Frequency Estimation, Mean Estimation, Frequent Itemset Mining
PDF Full Text Request
Related items