Deep learning has become a research hotspot in the field of artificial intelligence.Compared to traditional machine learning models,its powerful fitting capability tend to result in better performance for models based on deep neural networks.Meanwhile,security issues in deep learning have also attracted widespread attention,and adversarial example attacks are a typical security threat in deep neural networks.Adversarial examples can lead to erroneous results that contradict human cognition while maintaining high confidentiality,posing severe security challenges for deep learning applications.As a result,generating high-quality adversarial examples to evaluate the robustness of deep learning models and optimize models,and defending against adversarial example attacks have become important research topics in deep learning-related studies.This dissertation focuses on deep learning-based natural language understanding models and analyzes the security threats posed by adversarial examples to these models.The dissertation conducts an analysis and review of existing research on adversarial example generation and defense.Based on this,the dissertation proposes more widely applicable multi-granularity adversarial attack and defense schemes and validates the effectiveness and superiority of the scheme through extensive experiments.In sum,this dissertation achieves the following objectives:1)This dissertation proposes a word-level textual adversarial attack scheme based on domain corpus augmentation.In textual adversarial attacks,the perturbation space is the candidate space for adversarial example generation,which determines the potential upper limit of the attack effectiveness.Existing schemes mainly construct perturbation space based on universal linguistic features,ignoring the fact that natural language understanding models in the real world are often used for specific domain tasks.This limitation limits the practicality of the scheme and makes it difficult to effectively evaluate the robustness of real-world models.This dissertation builds a domain-specific perturbation space based on domain corpus augmentation,which significantly improves the attack success rate while maintaining example quality on the base of the universal perturbation space.2)This dissertation proposes a sentence-level textual adversarial attack scheme based on controlled text generation.Characters and words are two commonly used granularities for attacking examples in textual adversarial attacks.Character-level perturbations attack examples by deleting,adding,or replacing characters in example words,while word-level perturbations generally attack samples by replacing words in the example.However,these two types of adversarial attacks are vulnerable to simple example preprocessing methods,such as spell-checking and low-frequency word replacement,and cannot evaluate the model’s robustness against more macro-level sentence-level perturbations,thus limiting their applicability.This dissertation models adversarial example generation as a controlled text generation task,achieving sentence-level adversarial attacks.3)This dissertation proposes a multi-granularity adversarial example detection framework that combines detection of adversarial examples with computation of example saliency distribution.Existing methods for detecting adversarial examples typically target individual granularity levels,making them difficult to apply to other granularity levels and limiting their scope of applicability.The proposed framework is based on contrastive learning and can be used to detect adversarial examples of multiple granularity levels.Additionally,an end-to-end example saliency distribution computation model is introduced and incorporated into the adversarial example detection framework.This allows for post-hoc explanation of the target model’s predictions while detecting adversarial examples,providing analytical support for the detection results. |