Font Size: a A A

Research On Training Data Security In Multi-party Deep Learning Scenarios

Posted on:2021-04-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y XiongFull Text:PDF
GTID:2428330647451061Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of neural networks,relevant machine learning tasks have been widely used in all aspects of production and life.Because of the complex structure of neural networks and the large number of neurons,they can remember a lot of information,including normal training information or malicious embedded information.When multiple participants participate in the training,malicious participants may modify the training data to inject abnormal information into the neural network during the training.At present,the widely used multi-party training scenarios include distributed training and third-party training.And we study the data security problems in these two scenarios respectively.In distributed training,multiple participants train a model together.Their respective data and models are saved locally and the gradient updates are shared through the parameter server.The data privacy of individual trainers seems to be protected,but research has shown that a malicious trainer can steal data from a class that does not belong to him through shared gradient updates.If a malicious attacker trains GAN locally and adds the fake data generated by GAN to the local training,it can prompt the victim to disclose more information about the target class.Current defense methods against such attacks are based on differential privacy,cryptography,or trusted execution environments.Some of these methods will affect the accuracy of model training,some will incur large computational overhead,and some can not be widely used due to hardware constraints.We hope to find a solution to the problem without affecting the training of the model,so we consider the defense method based on GAN from the perspective of detection for the first time.Our detection method only needs to analyze the gradientupdates uploaded by the trainer,and does not need to change the training process of the model.The detection process is transparent to normal training users and can detect attackers within a small number of training epochs at the beginning of training.We conducted a large number of experiments based on MNIST and AT&T to demonstrate the accuracy and effectiveness of our detection method.The third party training scenario is derived from the huge computing and data storage resource overhead of the neural network.Since many ordinary users can not afford the training cost of neural network,especially deep network,they will entrust the training task to the third party server in many cases.If a malicious server poisons the training data,a trigger can be inserted into the model to conduct a backdoor attack.Models with a backdoor classify clean data accurately,but misclassify data with the trigger.Because the backdoor attack keeps threatening the security and safety of machine learning related tasks,there are many researches on the backdoor attack and its detection.The state-of-art backdoor detections have made great progress by reconstructing backdoor triggers and performing the outlier detection.We proposed two new backdoor attacks that can be inserted without being detected by this kind of detection.We conducted a large number of attack experiments based on the three data sets of MNIST,GTSRB and You Tube Faces to demonstrate the invisibility and effectiveness of our proposed attacks.In view of these two more covert backdoor attacks,we propose a possible defense scheme,and hope to bring some enlightenment to the detection and defense of backdoor attacks in the future.
Keywords/Search Tags:Distributed learning, Backdoor attack, Outlier detection
PDF Full Text Request
Related items