A large number of researchers have clearly shown that deep neural networks are vulnerable to adversarial examples.It has been demonstrated that by adding small and humanimperceptible perturbations,images can easily mislead the predictions of deep neural network models.What’s worse,adversarial examples are shown may have transferability,i.e.,adversarial examples generated by one model can successfully attack another model with a high probability.Hence,attackers can attack an unseen black-box target model by conducting adversarial samples on a substitute model.This type of attack is commonly known as a transfer-based attack.While encouraging results are achieved,theoretical analysis for guaranteeing the success of the attack is still absent.The main contributions of this thesis are as follows:1.This thesis presents a generalization error bound for black-box targeted attacks.The bound reveals that the attack error on a target model mainly depends on the empirical attack error on the substitute model and the maximum model discrepancy among substitute models.2.Based on the theoretical analysis,this thesis designs a new approach for black-box targeted attacks.The method additionally minimizes the maximum model discrepancy of the substitute models when training the generator to generate adversarial examples.Specifically,two substitute models are expected to maintain their model discrepancy as large as possible.At the same time,the generator is trained to conduct an adversarial example to attack these two substitute models and simultaneously minimize the discrepancy between the two substitute models.In other words,the generator and the two substitute models are trained in an adversarial manner to play a min-max game in terms of the model discrepancy.In this way,the generator is expected to generate adversarial examples that are robust to the variation of the substitute models,thus being capable of attacking the black-box target model successfully with a high chance.3.This thesis conducts extensive experiments on the Image Net dataset utilizing various benchmark models.The approach of this thesis outperforms current state-of-the-art methods by a significant margin on a wide range of attack settings.Especially,impressive improvements are shown in situations when the black-box model has a large model discrepancy from the substitute model. |