As a national basic strategic resource,data is accelerating to become a new driving force and engine of global economic growth.How to effectively protect data privacy while exploring and unlocking the potential value of data,and accelerating open data sharing,is currently a challenging problem faced by both academia and industry.Differential privacy is a privacy-preserving model based on solid mathematical theory and independent of background knowledge.It has been widely adopted in various machine learning fields that require massive data supports.In this thesis,we focus on the classic algorithm regression analysis in machine learning to conduct research on the key technology of privacy-preserving machine learning based on differential privacy.To address the challenging tradeoff between usability and privacy in existing differential privacy regression analysis algorithms,we propose two high-availability differential privacy regression analysis algorithms for two typical learning scenarios: centralized learning and federated learning.These algorithms utilize phased noise addition technique and adaptive noise technique,respectively.The main contributions of this work are as follows:1.In a centralized learning scenario,we propose a differential privacy linear regression algorithm based on principal component analysis and functional mechanism.Linear regression,as one of the most fundamental and classical models in machine learning,and the corresponding research on differential privacy preserving techniques have received significant attention in the field of machine learning privacy-preserving.However,most differential privacy-based linear regression algorithms currently rely on the Laplace mechanism,which suffers from the problems of large global sensitivity and poor model usability.To enhance the usability of the algorithm,this thesis proposes a differential privacy linear regression algorithm based on principal component analysis and functional mechanism(PCAFM-DPLR)by using a Gaussian mechanism with more concentrated distribution instead of the traditional Laplace mechanism,and by improving the common way of adding noise in only one phase of the existing algorithm to adding noise in each of the two main phases of the algorithm.Firstly,the goal of adding less noise with the same privacy budget is achieved by exploiting the property that the Gaussian mechanism distribution function will be more tightly concentrated around the zero point compared to the Laplace mechanism;Secondly,the original regression analysis algorithm is divided into two stages: dimensionality reduction mapping and linear regression by combining principal component analysis techniques,and the purpose of perturbing the regression analysis model is achieved by adding gaussian noise to the covariance matrix in the dimensionality reduction mapping and the coefficients of the objective function expansion of linear regression,respectively.Theoretical analysis and experimental results demonstrate that the linear regression model trained using PCAFM-DPLR algorithm can effectively guarantee privacy while maintaining good usability.2.In a federated learning scenario,we propose a privacy-preserving federated logistic regression algorithm based on adaptive functional mechanism.Federated learning can achieve joint modeling across devices without sharing local data,and differential privacy provides a strong guarantee to solve the privacy security problem in the joint modeling process of federated learning.For logistic regression tasks commonly used in machine learning,existing differential privacy-based federated learning solutions contain shortcomings such as large global privacy loss and fixed allocation of privacy budget.To address these problems,this thesis proposes a privacy-preserving federated logistic regression algorithm with an adaptive functional mechanism(AFM-Fed LR)for the horizontal federated learning scenario.Firstly,the second-order Taylor expansion of the objective function of logistic regression is carried out,and the noise injected into the expansion coefficient of the objective function is reduced by adding gaussian noise;Secondly,each participant calculates the correlation between the inputs and outputs using a layer-wise relevance propagation algorithm,and further reduces the noise injection by adaptively and dynamically allocating the privacy budget according to the strength of the input-output correlation;After that,each participant uses the perturbed objective function as the new optimization objective,as well as the joint central server to obtain the optimal model parameters.Privacy analysis and experimental results demonstrate that the global privacy loss of the proposed scheme is independent of the number of training iterations,and it can effectively protect users’ privacy data while reducing the loss of model accuracy. |