Font Size: a A A

Research On Privacy Preserving Methods In Stochastic Gradient Descent

Posted on:2022-11-27Degree:MasterType:Thesis
Country:ChinaCandidate:J H DuanFull Text:PDF
GTID:2518306770971729Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
Machine learning allows machines to learn rules from a large amount of historical data and make judgments,identifications or predictions for new samples.As the core technology of artificial intelligence,machine learning has been widely used in computer vision,natural language processing,recommendation systems and other fields.The convergence of technologies such as edge computing,Internet of Things,cloud computing and machine learning has facilitated a boom in technology in areas such as healthcare and banking.However,data related to such fields may contain potentially sensitive data,and if it is not handled accordingly,sensitive data may be stolen or leaked to jeopardize,the privacy and security of data holders.Therefore,the security of machine learning is critical to its development.Gradient descent(GD)has an important place in a variety of optimization problems.It has many advantages,including that the algorithm structure is simple and easy to implement,the iterative stability is strong,the complexity is moderate.However,the security of gradient descent cannot be guaranteed,and data reconstruction attacks can use gradient information to infer training data.Federated learning is a new type of distributed machine learning framework that trains machine models through the data owner's devices,without directly sharing the data to protect privacy.However,the existing research work shows that data privacy is not enough to protect data privacy simply by not sharing training data.Because such parameter server models often require shared gradients or model parameters to learn together,attackers can use this information to infer user privacy.Therefore,this thesis analyzes the information leakage problem and the shortcomings of existing protection methods in neural networks of traditional gradient descent methods,proposes a super stochastic gradient descent method and applies it to federated learning scenarios.The main research results are as follows:(1)The solving the problem of privacy leakage caused by gradient information in machine learning systems,this thesis proposes the Super Stochastic Gradient Descent(SSGD),which updates the parameters by hiding the modulus length of the gradient vector and converting it to a unit vector.The mold length of the trimming gradient brings the gradient direction after gradient aggregation to be super-random.In order for superstochasticity not to produce poor results,the sum of a set of small batch number gradients is treated as a base gradient,and the parameters are updated using the sum of the unit gradients of multiple base gradients.In addition,the security of the SSGD method is analyzed and the proposed algorithm can resist gradient attacks while maintaining the accuracy of the model.Finally,to test the approach presented in this thesis,a comparative experiment was designed on two real-world datasets.Experimental results show that the proposed method is significantly better than the existing gradient descent method in terms of accuracy,robustness and adaptability to large-scale batches of data.What's more,the SSGD algorithm is resistant to poisoning attacks to some extent.(2)Aiming at the privacy problems caused by directly sharing gradients or model parameters in the Iterative Federated Clustering Algorithm(IFCA),and the problem that the initialization selection of the model may lead to model homogenization and the model is too poor to be updated,this thesis proposes the Locally Reinforced Federated Learning(LRFL).The worker nodes use superstochastic gradient descent to update parameters for protecting the training data leakage caused by gradient or model parameter sharing.By performing a clustering operation in the first round of server updates,the automatic selection of multiple models is completed,eliminating the influence of model initialization selection.At the same time,considering the distribution of inter-cluster data,the aggregation weights of model parameters are adjusted according to the differences in model parameters,which increases the robustness of the algorithm.This method only requires one clustering operation when the parameter is first updated,which reduces the amount of computation.Finally,the utility and safety of the LRFL method are validated on three datasets.
Keywords/Search Tags:Machine learning, Gradient descent, Privacy protection, Federal learning, Clustering
PDF Full Text Request
Related items