Font Size: a A A

Research In Defense Technologies Against Backdoor Attacks On Deep Neural Network

Posted on:2024-03-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y G LiFull Text:PDF
GTID:1528307340974389Subject:Information security
Abstract/Summary:PDF Full Text Request
Artificial intelligence(AI)technology has made significant progress and plays an important role in fields like facial recognition and autonomous driving.However,deep neural networks(DNNs)are vulnerable to backdoor attacks due to their black-box nature and lack of interpretability.Backdoor attacks occur during the model training phase when an adversary intentionally injects backdoor triggers into DNNs,manipulating their prediction behavior to achieve the attack objective.These attacks are difficult to detect as the models act normally on clean inputs and only produce specific attack labels when the inputs contain the backdoor trigger.Backdoor attacks are prevalent in scenarios like pre-trained models,outsourced training,transfer learning,and multi-party collaborative training,posing a severe threat to the security-critical applications of DNNs in the real world.This thesis focuses on the issue of backdoor attacks in image classification tasks and proposes effective defense methods to counter them during both the model training and deployment stages.This ensures secure deployment and a reliable model supply chain.The detailed descriptions are as follows:1.Model Training Stage: Robust Model Training with Anti-Backdoor Learning(ABL)enables training of benign models on backdoored datasets.The training of deep models typically relies on large-scale datasets sourced from web scraping or untrusted third-party data platforms.Consequently,these datasets may contain backdoored samples maliciously inserted by attackers.To achieve the goal of training benign models on poisoned datasets,this work defines the entire model training process as a dual task of learning clean data and backdoor data,identifying two inherent weaknesses of backdoor attacks: 1)the models learn backdoored data much faster than learning with clean data,and the stronger the attack the faster the model converges on backdoored data; 2)the backdoor task is tied to a specific class(the backdoor target class).Based on these two weaknesses,this thesis proposes a general learning scheme,Anti-Backdoor Learning(ABL),to automatically prevent backdoor attacks during training.Specifically,ABL introduces a two-stage gradient ascent mechanism for standard training to 1)help isolate backdoor examples at an early training stage,and 2)break the correlation between backdoor examples and the target class at a later training stage.Extensive experiments on multiple benchmark datasets against 10 state-of-the-art attacks demonstrate that ABL effectively mitigates the implication of backdoor attacks and achieves the same accuracy performance consistent with models trained on clean data.2.Model Deployment Stage: Backdoor Model Purification through Neural Attention Distillation(NAD).This research is the first to establish a connection between knowledge distillation and backdoor defense.During the model deployment phase,a backdoored model embedded with triggers behaves normally on clean samples,only producing erroneous predictions when the trigger pattern is present.Therefore,it is necessary to purify the backdoor model and remove the malicious behavior embedded within it.In this thesis,a novel defense framework called Neural Attention Distillation(NAD)is proposed to remove backdoor triggers from the backdoored model.Specifically,the NAD method utilizes a teacher network to guide the fine-tuning of the backdoored student network on a small subset of clean data,ensuring the alignment between the attention of the student network’s intermediate layers and the attention of the teacher network.The teacher network can be obtained through independent fine-tuning on the same clean subset.Experimental results demonstrate that NAD can effectively defend against 10 state-of-the-art backdoor attacks using only 5% clean data while maintaining nearly unaffected performance on clean samples.3.Model Deployment Stage: Backdoor Model Purification through Reconstructive Neuron Pruning(RNP).Although current defense methods have made progress in removing backdoor triggers,the identification of backdoor neurons associated with trigger behavior remains an open question.In this thesis,we propose a novel defense method called Reconstructive Neuron Pruning(RNP)to address this challenge.Specifically,RNP maximizes the model’s loss on a small subset of clean samples through unlearning operations and then minimizes the model’s loss on the same data through recovering operations to expose the backdoor-related neurons.In RNP,unlearning is operated at the neuron level while recovering is operated at the filter level,forming an asymmetric reconstructive learning procedure.We show that such an asymmetric process on only a few clean samples can effectively expose and prune the backdoor neurons implanted by a wide range of attacks,achieving a new state-of-the-art defense performance.Experimental results demonstrate that the RNP method effectively defends against 12 advanced backdoor attacks and achieves the best defense performance in backdoor trigger removal.Furthermore,the intermediate models obtained through the unlearning operation in RNP can be combined with various mainstream defense strategies,including backdoor removal,backdoor trigger recovery,backdoor label detection,and backdoor sample detection,to further enhance their defense performance.4.A One-stop Backdoor Defense Framework: Although current backdoor defense methods have made progress in individual tasks such as backdoor detection or removal,these defense techniques are relatively scattered and lack usability in industrial scenarios.To address this,we introduce novel research on a one-stop backdoor defense framework,aiming to build a comprehensive defense framework that includes ”backdoor model detection-backdoor label detection-trigger reverse engineering-backdoor removal.” One of the technical challenges in this defense system lies in effectively revealing and exposing backdoor neurons.To tackle this problem,we propose using the ”Neural Unlearning”(NU)technique to suppress benign neurons while preserving and exposing internal backdoor neurons in deep models.The objective of NU is to retain the backdoor parameters(neurons)to the maximum extent while suppressing clean parameters(neurons),resulting in a neural unlearned model that highly exposes backdoor neurons.Interestingly,the neural unlearned model helps infer suspicious backdoor labels and improves trigger reverse engineering,thus facilitating backdoor trigger removal.Additionally,the NU technique contributes to enhancing various downstream defense tasks.In summary,the aforementioned NU technique breaks down the technical barriers among existing backdoor defense methods by effectively integrating backdoor detection,reverse engineering,and trigger removal methods,promoting the construction of a one-stop defense system that includes ”backdoor model detection-backdoor label detection-trigger reverse engineering-backdoor removal.”...
Keywords/Search Tags:Artificial Intelligence, Deep Neural Network, Backdoor Attack, Backdoor Defense
PDF Full Text Request
Related items