Font Size: a A A

Generating Robust Audio Adversarial Examples For End-to-end Speech Recognition

Posted on:2022-04-07Degree:MasterType:Thesis
Country:ChinaCandidate:H T ZhangFull Text:PDF
GTID:2518306572981989Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Intelligent voice has been widely used as a new type of human-machine communication.Recent studies have shown that automatic speech recognition models based on deep neural networks are vulnerable to audio adversarial examples and thus make wrong recognition.Studying these attack approaches helps discover potential problems,which may exist in the deep learning model training process.Also,studying them promotes the research of deep learning theory.Among the existing attack approaches,white-box attacks mainly rely on optimization-based algorithms,and black-box attacks mainly rely on genetic algorithms.However,these approaches not only require lots of computing resources,but don't perform well during the attacks.For examples,first of all,these approaches often break temporal dependency property of audio.Therefore,they can be easily detected by state-of-the-art defense mechanisms.Moreover,these approaches usually introduce noticeable noises into the original audio,particularly during the periods of silences and pauses.In this thesis,the above problems are discussed and studied in depth,and the main work can be divided into the following four aspects:Aiming at the problem that the non-differentiable feature extraction,this thesis proposes an audio feature extraction module which is implemented by tensor calculation.It avoids the lossy conversion from features to waveforms and makes the end-to-end adversarial examples generation possible.Therefore,the auditory quality of adversarial examples is greatly improved.Aiming at the problem of weak robustness of audio adversarial examples under temporal dependency(TD)based defense,this thesis proposes a robust Iterative Proportional Clipping(IPC)algorithm.Taking temporal dependency into account,this thesis designs an objective loss function which has an effect of proportional clipping on adversarial distortion.Proportional clipping on adversarial distortion not only improves the auditory quality of adversarial examples by maintaining the frequency features of audio,but also preserves temporal dependency so that adversarial examples can break TD defense.Aiming at the problem of too much perceptible noise in adversarial examples,this thesis proposes a robust psychoacoustic hiding method.It uses psychoacoustic knowledge to hide the distortion below the human hearing threshold,and greatly improves the auditory quality of adversarial example.It is perfectly integrated with the optimization-based adversarial example generation framework,providing new ideas for audio adversarial attack in future.Finally,this thesis verifies the effectiveness of the proposed IPC and Psy-hiding methods by qualitative and quantitative analysis,through the attack success rate,subjective and objective quality evaluation,robustness experiments under the defense mechanism,and visualization comparison.Attacks are launched on the large-scale speech dataset Libri Speech towards the state-of-the-art Wav2 Letter V2 speech recognition model.The experimental results prove that the success rates of the two methods reach 100%.There are95.0% adversarial examples can not be distinguished in ABX tests with 50 participants.When TD defense classifies adversarial examples of the two methods,the AUC score is in the range of 0.5 to 0.7,which shows a bad classification of TD defense and strong robustness of adversarial examples.This thesis also reveals that the Wav2 Letter V2 model is at risk of being attacked.
Keywords/Search Tags:adversarial examples, automatic speech recognition, temporal dependency, proportional clipping, psychoacoustic masking
PDF Full Text Request
Related items