Font Size: a A A

Adversarial Attack And Defense Against End2end Automatic Speech Recognition

Posted on:2024-08-07Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhangFull Text:PDF
GTID:2568307067472314Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
With the rapid development of artificial intelligence technology,intelligent systems are gradually emerging in people’s daily lives.ASR(Automatic Speech recognition)is an important part of artificial intelligence,and its usage scenarios are highly close to people’s lives.ASR applications include map navigation,speech-to-text input,and smart home speakers.End2 end ASR models,distinct from traditional mixed models,utilize the powerful fitting capabilities of DNN(Deep Neural Networks)to obviate the need for complex probability transforms required for ASR,reducing the difficulty of development further.This reduction in cost of using ASR systems has led to their widespread adoption in various aspects of our life.In recent years,people have started to pay attention to the security research of ASR systems.If ASR systems have not strong resistance to adversarial attacks or cannot even detect adversarial attacks,commands executed by audio control devices may be distorted,and illegal audio on social media may not be recognized or decleared,resulting serious consequences.Adversarial attacks on artificial intelligence technology first appeared in the field of computer vision,and have gradually expanded to NLP(Natural Language Processing)and Speech fields in recent years.Currently,there are some adversarial attacks and defense techniques for ASR systems,but there are two shortcomings.First,existing attack methods are limited to attacking ASR models represented by HMM-DNN(Hidden Markov Model & Deep Neural Network)hybrid models or Deep Speech-like end-to-end ASR models that depend on CTC(Connectionist Temporal Classification)structures.These attacked models are no longer the majority today.Second,adversarial defense methods for ASR systems are limited to expanding the image field,such as adversarial training and input reconstruction,and there are no defense ways for the time-series multi-output characteristics of ASR task.In view of these problems,this paper achieves adversarial attacks on the latest ASR systems and designs adversarial defense methods based on the characteristics of ASR task,improving robustness.The main work of this paper is as follows.(1)Based on the latest end-to-end speech ASR technology,a multi-decoder joint Seq2 Seq speech recognition model is trained,and an adversarial attack framework is proposed against this model.Different loss functions and white-box adversarial attack methods are used to generate adversarial samples and compare their effects.The results show that the attack effectiveness of the independence assumption network loss in adversarial attacks is lower than that of the context-related network loss,and the attack effectiveness of single-step iteration is lower than that of multi-step iteration.(2)Based on the time-series characteristics of ASR task,an adversarial defense technique based on post-processing strategy is proposed,and the effectiveness of the multi-decoder joint decoding post-processing in ASR systems is verified.The relationship between the rescore mechanism and adversarial noise is analyzed,and adversarial attack detection is performed using the rescore mechanism,with experimental verification that 95% or more adversarial samples can be detected.(3)Based on the above attack and defense work,a robust ASR system framework is proposed,and a system prototype is completed.The system can recognize audio segments of users in real-time.After input,it performs post-processing and returns complete recognition results.It can mitigate adversarial attack,and utilizes attack detection as a final defense.
Keywords/Search Tags:End2End ASR, Adversarial Attack, Adversarial Defense, Attack Detection, Machine Learning
PDF Full Text Request
Related items