Font Size: a A A

Research And Implementation On Privacy Protection Technology For Training Samples In Machine Learning

Posted on:2021-12-13Degree:MasterType:Thesis
Country:ChinaCandidate:J ChenFull Text:PDF
GTID:2518306308970239Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
In recent years,with the continuous development and maturity of theory,ma-chine learning technology is being widely used in various industries.Machine learn-ing algorithms learn and mine statistical knowledge from training data and predict un-known things to assist humans in making decisions.Sufficient data is an essential condition for machine learning.Therefore,in order to achieve better performance for machine learning models,massive amounts of data are being collected and utilized.In some specific application scenarios,such as medical treatment and personalized rec-ommendations,user data will inevitably involve personal privacy that is inconvenient to disclose.The availability and privacy of user data should considered when collect-ing and using these data.In order to protect the privacy of the data,this paper proposes a scheme based on particle swarm optimization to generate a new data set from the original data set and train the model with the new data set so that the model does not directly contact the original data set.This article delves into the membership inference attack that can de-termine whether a record is in the model training set.Based on its algorithm principle and attack effects on different public data sets,it analyzes the vulnerability of the at-tack and the sensitivity to different distributed data,and summarize the characteristics of data and models with strong resistance to attacks are presented.Guided by the above analysis results,this paper proposes a method to generate a new data set from the original data set:the method of particle swarm migration.The particle swarm mi-gration method fully considers the vulnerability of the attack and the availability of data in the above analysis results make the new data set generated difficult to be at-tacked and the loss of model accuracy is controllable.In addition,this paper also adds noise to the gradient of the optimal iteration when training the model,so that the train-ing process of the model satisfies differential privacy.In this paper,a comprehensive experiment is performed on the above method on the MNIST data set.The experimental results show that the proposed scheme can well resist the membership inference attack with a small loss of model accuracy.And the particle swarm sample migration method has better protection effect and smaller model accuracy loss than the random noise method.Based on the algorithm proposed above,this paper designs and implements a machine learning model training system that protects the privacy of samples,intro-duces its main functional modules and processes in detail,and proves its usability by experiments.The results show that the system achieves a good protection of the pri-vacy of the training data with a controlled loss of accuracy.
Keywords/Search Tags:Machine learning, Privacy protection, Differential privacy, Particle swarm optimization
PDF Full Text Request
Related items