Research On Differential Privacy Preservation For Data Analysis

Posted on:2023-02-23

Degree:Doctor

Type:Dissertation

Country:China

Candidate:K Pan

Full Text:PDF

GTID:1528306905996979

Subject:Pattern Recognition and Intelligent Systems

Abstract/Summary:

PDF Full Text Request

Explosive data growth has brought opportunities and challenges to human society.On the one hand,the analysis and processing of big data can promote the expeditious advancement of artificial intelligence and output tremendous value for various walks of life.On the other hand,the process of analyzing and processing big data inevitably involves the vital issue of privacy leakage,which brings severe challenges to human life.The trouble faced by people is not only limited to the leakage of personal privacy,but also lies in the prediction of behaviors of people based on big data analysis and processing.As excellent methods to achieve artificial intelligence,machine learning and deep learning endow computers with the ability to learn over large-scale training data.However,these representative training data may contain sensitive information of individuals,such as financial status,address,consumption records,etc.During the training process of machine learning and deep learning models,the sensitive information is capable of being remembered inadvertently and thus leading to privacy leakage.On the flip side,the untrusted data curator,training participants,and external adversaries may attempt to infer private information,reconstruct sensitive features,or even extract independent properties from training data by building personalized attack models.Therefore,the privacy leakage problem faced in the process of big data analysis and processing is non-negligible.As one of the mainstream privacy-preserving techniques,differential privacy is not only based on the solid background knowledge assumption,but also provides provable and strict privacy guarantees owing to the strong mathematical foundation.Thus,training machine learning and deep learning models based on differential privacy can effectively protect the sensitive information in training data and alleviates the reconstruction and inference of sensitive information.In recent years,various differentially private machine learning and deep learning models have been presented and reaped abundant fruits in the area of data security and privacy preservation.However,these traditional differentially private machine learning and deep learning models always exist some limitations such as high total privacy budget and low model utility,which bring serious challenges to the trade-off between data privacy preservation and model accuracy.Taking this cue,in this paper,focusing on the challenges faced by traditional differential privacy preservation methods for data analysis,we alleviate the privacy leakage problem effectively during the training procedure of machine learning and deep learning models.According to the real-world requirement of dynamic privacy perception and on-demand privacy preservation,we explore the personalized noise perturbation strategy based on objective perturbation,output perturbation and gradient perturbation,build novel privacy budget allocation rules,provide rigorous and comprehensive privacy-preserving level proofs for models,and achieve a graceful compromise among model utility,data availability and privacy-preserving level.The researches in this dissertation are summarized as follows:(1)Regarding the privacy leakage in regression models,and the existing differentially private regression analysis methods mainly focus on the protection of sensitive information,but ignore the issue of model utility,we propose a differential privacy preservation model in regression analysis based on the relevance.We first calculate the relevance between each input feature and the model output,and divide the input features into strongly relevant features and weakly relevant features based on the computed relevance by setting a threshold.Then,we transform the objective function into the polynomial form and achieve an appropriate noise perturbation by injecting less noise into polynomial coefficients involving strongly relevant features and adding more noise to polynomial coefficients involving weakly relevant features.Experiments on benchmark datasets indicate that the relevance-based differentially private regression model mitigates the drawbacks of traditional differentially private regression analysis methods.(2)Focusing on the above model that divides input features into strongly relevant features and weakly relevant features according to a fixed threshold has shortcomings such as poor universality and low model accuracy,we design a differentially private regression analysis approach with dynamic privacy budget allocation.The presented approach achieves dynamic perturbation of the objective function by adding dynamic noise to the coefficients of the polynomial objective function based on the distinct contribution of each input feature to the model output.Furthermore,this approach also achieves the verification of the privacy-preserving level against model inversion attack.Extensive experiments conducted on benchmark datasets illustrate that the proposed method can raise the model accuracy under the same privacy-preserving level while yielding notable results in resisting the model inversion attack.(3)The problem of the trade-off between model utility and privacy-preserving level also exists in differentially private deep neural networks.Facing the complex model structure of neural networks,to mitigate the privacy threats faced by deep neural networks while bridging the gap between private model and non-private model,we design a general differentially private deep neural networks framework.Each neuron is regarded as a feature,and according to the different correlations between neurons in different layers and the model output,we perturb the gradients of the loss function on neurons adaptively during the process of backward propagation.As a general mechanism,the presented framework not only achieves the adaptive perturbation of stochastic gradient descent,but also ensures the adaptive perturbation of Momentum and Adam optimization algorithms.Theoretical analysis of the privacy-preserving level and rigorous experiments demonstrate that the designed differentially private deep neural networks framework has a desirable performance.(4)Multi-party learning effectively avoids the direct contact between the cloud server and the training data of clients,it resists privacy risks by enabling clients to train their model locally and only shares a portion of model parameters to the server.However,the sensitive information is still vulnerable to some attacks from adversaries who aim to steal the privacy of training data when the uploaded model parameters are not fully safeguarded.In addition,the traditional privacy-preserving multi-party learning approaches mainly involve privacypreserving techniques including secure multi-party computation,homomotphic encryption,etc.,which may suffer from diffculties such as lower computational efficiency and higher communication burden.Therefore,we present a zero-concentrated differentially private multi-party learning algorithm based on adaptive privacy loss allocation.This algorithm provides a stronger privacy guarantee and permits tighter bounds for privacy computations in the multi-party learning model by taking advantage of zero-concentrated differential privacy.Then we design an adaptive privacy loss allocation method to reduce the accumulation of the total privacy budget,which follows the intuition that the privacy loss allocated to model parameters should increase as the model parameters gradually approach the optimum.Sufficient theoretical analysis and experiments validate the proposed algorithm can reduce the total privacy budget effectively while maximizing the model utility.(5)As one of the unsupervised deep learning models,generative adversarial networks intend to capture the underlying distribution of training data and generate realistic-looking samples.However,during the training process of generative adversarial networks,the high complexity of deep models enables generative adversarial networks to memorize training samples and thus increasing the risk of privacy leakage.To alleviate this issue,we propose a truncated concentrated differentially private generative adversarial network based on the personalized noise decay strategy.To be specific,we first achieve the differential privacy preservation for the discriminator by injecting Gaussian noise into the optimization process of the discriminator.Then,based on the post-processing property of differential privacy,the parameters of the generator can still achieve differential privacy preservation.Moreover,we design two noise decay strategies,and different strategies provide us with an intuitive handle to achieve an elegant compromise between model utility and privacy preservation.In the light of different actual demands,we choose different noise decay solutions.Theoretical analysis of privacy bound and extensive experiments manifest that our algorithm can not only alleviate the privacy leakage in generative adversarial networks,but also possesses the ability to resist the membership inference attack.

Keywords/Search Tags:

Privacy leakage, differential privacy, privacy preservation, privacy budget, privacy-preserving level, model utility, noise perturbation

PDF Full Text Request

Related items

1	Preserving User Privacy For Large-Scale Personalized Online Video Service
2	Research On Privacy Preserving Publishing Of Big Location Data Based On Differential Privacy
3	Research On Federal Learning Privacy Protection Method Based On Differential Privacy
4	Research On Location Privacy Issues Using Information Theory
5	Efficient Federated Learning Of Adaptive Communication Based On Differential Privacy
6	Adaptive Differential Privacy And Its Applications
7	Research On Differentially Private Classification And Recommendation Algorithms
8	Optimization And Application Of Privacy Budget In Differential Privacy Protection
9	Research On Differential Privacy Protection For Data Release
10	Privacy Level Evaluation Of Differential Privacy For Time Series Based On Filtering Theory