Font Size: a A A

Research On Defense Mechanism Against Adversarial Examples In Automatic Speech Recognition

Posted on:2022-10-20Degree:MasterType:Thesis
Country:ChinaCandidate:J LiuFull Text:PDF
GTID:2518306605967339Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Benefiting from the development of big data,cloud computing,and deep learning,major breakthroughs have been made in speech recognition in recent years.Since then,more and more smart products have chosen speech as the interface for human-computer interaction,caused golden opportunity for development of the intelligent speech industry.In addition,the country's favorable policies on artificial intelligence have frequently emerged,providing a good policy environment for the industrialization of intelligent speech.Nowadays,the application of intelligent speech has covered many scenarios,such as smart home,smart car,smart medical,etc.,changing people's way of life.According to Sullivan's prediction,the market size of China's intelligent speech industry will reach 65.51 billion yuan in 2023.While people are enjoying the social changes brought about by speech recognition technology,a factor of instability quietly emerged called audio adversarial example which is a type of audio deliberately generated by the attacker via adding subtle disturbances to the original audio signal.The added disturbance which sounds like noise cannot change the human listener's perception of the audio clip but will cause ASR system make wrong transcription.Well-designed audio adversarial examples may be used maliciously,for instance,an audio adversarial example tampered as "open the door" may be delivered into smart door locks,etc.,threatening people's lives and property without being noticed.Therefore,this thesis intends to study the defense mechanism of audio adversarial examples in speech recognition so as to prevent them from entering the intelligent speech device to execute illegal commands.Three detection algorithms for audio adversarial examples are proposed in this thesis,named the robust detection algorithm based on WER(Word Error Rate),the feature detection algorithm based on ADR(Adversarial Ratio),and the collaborative detection algorithm based on neural network.While existing studies have shown that the audio preprocessing method can be used for the detection of adversarial examples in the audio classification model,the robust detection algorithm realizes the migration of the audio preprocessing method to the detection of adversarial examples in the speech recognition model scene.Derived from the generation process of audio adversarial examples,the robust detection algorithm applies spectral subtraction to the audio adversarial examples for reducing noise artificially added in the original audio,and then calculates WER as the detection indicator.Existing some features that are not related to the adversarial feature of the audio in MFCC or Filter Banks,the feature detection algorithm tries to dig out the adversarial characteristics from the Filter Banks.Furthermore,the concept of ADR is proposed as the detection indicator based on the idea of statistics.Taking the problem of insufficient accuracy or recall indicators into account which may be caused by above two detection algorithms,the collaborative detection algorithm is proposed.In this algorithm,a neural network taking WER and ADR as inputs is trained to detect audio adversarial examples better.In order to evaluate the performance of three detection algorithms proposed in this thesis,we implement an audio adversarial attack library for Deep Speech by integrating three audio adversarial example generation algorithms named CW attack,GAGE attack and FGSM attack,and then generate the corresponding audio adversarial example based on the Mozilla Common Voice dataset.The experimental results show that three detection algorithms proposed in this thesis have a great discrimination on audio adversarial examples whatever generated by single attack or combined attack,and achieve high AUC scores.Among them,the cooperative detection is the best and the feature detection is the worst.In addition,we found that the robust detection algorithm tends to have a higher accuracy score but a lower recall score,while the feature detection algorithm tends to have the converse performance.Integrated the advantages of robust detection and feature detection,the collaborative detection gets higher scores in accuracy,recall and F1 score besides a greater discrimination,which proves the necessity of joint detection.Specifically,the collaborative detection algorithm detects audio adversarial examples with 96.94% precision and 95.00% recall in the combined attack scenario.
Keywords/Search Tags:Speech Recognition, Adversarial Examples, Spectral Subtraction, Filter Banks, Neural Networks
PDF Full Text Request
Related items