| Speaker recognition is a biometric technology that confirms identity by voice.However,deep neural network-based speaker recognition technology has gradually exposed adversarial vulnerabilities.With the rapid development of adversarial technology,adversarial examples have become one of the main threats to artificial intelligence technology,attracting a large number of researchers to explore.Recent studies have shown that adversarial attacks have evolved from white-box to gray-box,then to black-box scenarios,from the digital domain to the physical domain,from single-sample attacks to generalization attacks,and from offline attacks to real-time attacks.This provides more perspectives for adversarial defense and robustness but also brings significant challenges.In this paper,we focus on the adversarial robustness of speaker recognition systems and explore the essential characteristics of adversarial examples from two dimensions: Adversarial Attacks and Adversarial Defense.Specifically,on the one hand,we focus on the transferability of adversarial examples from the attacker’s perspective and explore more powerful adversarial examples.On the other hand,we propose an attack-agnostic perturbation purification method from the defense perspective.The problems addressed and innovative technologies in this paper are as follows:(1)In this paper,we propose a Newton-accelerated gradient method that uses forward gradient search and dynamically updates step size to overcome the problem of gradient-based adversarial attack algorithms getting trapped in local optima and overfitting to a single model,resulting in weak transferability.This method speeds up perturbation updates by optimizing the gradient direction and crossing local optima and improves the transferability of adversarial examples.(2)To address the problem of poor black-box transferability in adversarial sample generation due to insufficient use of gradient information,we propose a multi-step iterative method based on temporal momentum and spatial momentum.This method further optimizes and corrects gradient update directions using gradient information from both the sample neighborhood space and the sample internal space,which makes the generated adversarial examples have better attack transferability.(3)To address the problem of input reconstruction-based adversarial defense methods reducing the accuracy of the original model,we propose using a diffusion model to purify adversarial perturbations in speaker audio by denoising input data,making the adversarial perturbations lose their threat capability.Through large-scale experiments with 9 models under 3 network architectures,we first verify the black-box transferability of the proposed adversarial attack method on existing defenses and then demonstrates the advantages of the proposed defense method over existing ones against the aforementioned attacks.By studying both attack and defense,we aim to enhance the robustness and security of speaker recognition systems and provide technical support for the secure implementation of speaker recognition technology. |