Font Size: a A A

Research On Attack And Defense Of Speaker Recognition Adversarial Example

Posted on:2024-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:X Y ZhangFull Text:PDF
GTID:2568307073450224Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
Based on the uniqueness feature of human voiceprint,speaker recognition is widely used in biometric scenarios with generally high user usage and acceptance.Together with the rise of deep learning,which has driven the organic combination of the two technologies,the development of speaker recognition has been accelerated and the performance of all aspects has been significantly improved.However,the initial discovery of adversarial examples in the image domain,which are not easily detectable by humans and seriously threaten the security of deep learning models.This specific type of attack is also gradually appearing in the task of speaker recognition,leading to a much higher false recognition rate of speaker recognition systems and affecting the normal use of users.For this reason,better understanding the adversarial examples under speaker recognition helps us secure the speaker recognition system.The method of generating adversarial examples under speaker recognition,as in other application scenarios,is based on two main attack modes:white-box attacks and black-box attacks.Among the white-box attacks,there are gradient-based attacks such as FGSM(Fast Gradient Sign Method),PGD(Projected Gradient Descent),and MI-FGSM(Momentum Iteration-FGSM),which generate adversarial examples that are less stealthy and easily perceived by the human ear;there are also optimization-based attacks such as CW2(Carlini&-Wagner),which are extremely stealthy but require very huge computational time.The transferability of adversarial examples in black-box attacks has not been extensively studied in speaker recognition tasks,so we discuss the transferability of several existing attack methods.Among the defense methods,this paper only discusses adversarial training,and the existing ones are based on FGSM and PGD respectively,which improve the robustness more limited.In order to solve the above problems of adversarial examples under speaker recognition,this paper studies and discusses the following aspects:(1)For white-box attacks,a gradient-based highly stealthy adaptive decay attack method called Adaptive Decay Attack(ADA)is designed,which maintains the advantage of the gradient-based method in generation time,ensures the success rate of the attack,and is very close to CW2’s method in stealthy performance,achieving high stealthiness of the adversarial examples.The experimental results show that in the untargeted task of x-vector and i-vector speaker recognition models,The stealth metrics such as SNR and PESQ improve at least 30%and 39%,respectively,compared to the PGD method with the best attack performance.In the targeted task of speaker recognition,SNR and PESQ improved by at least 20%and 25%,respectively.In the speaker verification task,SNR and PESQ improved at least 29.5%and 33.4%,respectively.(2)For black box attacks,we explore the transferability of the adversarial examples between the x-vector and i-vector models to compare the advantages and disadvantages of several existing attack algorithms in terms of transferability,and found that NI-FGSM produced the highest successful transfer attack rate of75.5%for adversarial examples in closed-set speaker identification for untargeted attack,however,in closed-set speaker identification for targeted attacks and speaker verification tasks,the transferability of existing methods cannot cause very deadly attacks on these two models For the defense method,an ADA-based adversarial training method is designed,which improves the robustness higher than the PGD-based method that currently works best in adversarial training and requires less training time than the latter.The experimental results show that ADA-based adversarial training takes 28.31%less time than PGD-based adversarial training,and the attack success rate decreases from 50.88%to 36.47%and 64.74%to 45.82%for PGD and ADA,respectively.
Keywords/Search Tags:Adversarial Example, Speaker Recognition, Deep Learning, Adversarial Train, Transfer Attack
PDF Full Text Request
Related items