Font Size: a A A

Machine Learning Algorithms With Differential Privacy

Posted on:2022-08-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:P Y WangFull Text:PDF
GTID:1488306521466794Subject:Statistics
Abstract/Summary:PDF Full Text Request
While the popularity of the Internet facilitates people's lives,it also brings the risk of personal sensitive data privacy leakage.data collected at this time often contain sensitive information such as individual records from schools and hospitals,financial records for fraud detection,online behavior from social media and genomic data from cancer diagnosis.Modern machine learning algorithms can explore the fine-grained information about data in order to make a perfect prediction which,however,can lead to the privacy leakage.Therefore,studying privacy-preserving machine learning algorithms is of great significance.Differential privacy,as a privacy-preserving technology supported by math-ematical theory,is suitable for the needs of personal privacy protection in the era of big data.This dissertation focuses on the research of differentially privacy machine learning algorithms.First,we study sparse classification learning algo-rithms with differential privacy,and construct a privacy-preserving framework suitable for unstable sparse problems.Second,we focus on privacy-preserving Stochastic Gradient Descent(SGD)algorithm with non-smooth losses in the set-ting of stochastic convex optimization,and systematically study the privacy and statistical generalization performance for differentially privacy SGD algorithm based on ?-Holder continuous loss.Third,we focus on privacy protection for un-balanced data,and systematically study the output and objective perturbation mechanisms for the regularized ERM in AUC maximization,and provide com-prehensive results on their privacy guarantees.Finally,we develop a distributed logistic regression that satisfies differential privacy.The main works include the following.1.We present a differential privacy version of sparse classification approach.Based on Alternating Direction Method of Multiplier algorithm(ADMM),we transform the solving of sparse problem into the multistep iteration process.Then we add exponential noise to stable steps to achieve privacy protection.Furthermore,we prove the proposed approach satisfies the e-differential privacy even when the original problem is unstable,and present the theoretical priva-cy bound of the differential privacy classification algorithm.Finally,we apply our framework to logistic regression with l1 regularizer and logistic regression with L1/2 regularizer.Numerical studies demonstrate that our method is both effective and efficient which performs well in sensitive data analysis.2.We are concerned with differentially private SGD algorithms in the set-ting of stochastic convex optimization,and significantly establish the privacy and generalization(utility)guarantees for private SGD algorithms associated with non-smooth convex losses.Specifically,the loss function is relaxed to have?-Holder continuous gradient.We establish privacy and utility guarantees for SGD algorithms with output perturbation in an unbounded domain,which is the first-ever known result of this kind for unbounded domains.We also prove that noisy SGD with ?-Holder smooth losses using gradient perturbation can guarantee(e,?)-differential privacy and attain optimal excess population risk O((?)),up to logarithmic terms,with gradient complexity(i.e.the total number of iterations)T=O(n2-?/1+?+n).This shows an important trade-off between ?-Holder smoothness of the loss and the computational complexity T for private SGD with statistically optimal performance.3.We proposed differentially private ERM algorithms for the important problem of AUC maximization in imbalanced classification.In particular,we systematically studied the privacy guarantees for output perturbation and ob-jective perturbation with respect to both ?-DP and(?,?)-DP.Furthermore,we established utility guarantees on their generalization performance with fast rates.The main technical difficulty for deriving generalization bounds of the proposed algorithms is that the objective function for AUC maximization involves sta-tistically dependent pairs of examples.To this end,we introduced a new error decomposition and developed fast rates through a novel combination of peeling techniques for Rademacher averages and the properties of U-Statistics.4.We focus on the privacy protection of sensitive data and develop a distributed logistic regression that satisfies differential privacy.Distributed d-ifferential privacy is achieved by perturbing the distributed algorithm output.Further,to prevent privacy leakage occurring during the computer interaction process,we propose a distributed logistic variable perturbation algorithm based on ADMM algorithm.Further,the theoretical bounds of the algorithms are pro-vided.Experiments show that the proposed algorithms can effectively analyze distributed storage data and protect their privacy.
Keywords/Search Tags:Differential privacy, Sparse, SGD, AUC maximization, Empirical risk minimization, Distributed algorithm
PDF Full Text Request
Related items