Research On Adversarial Attack And Defense Against Natural Language Processing System

Posted on:2021-01-24

Degree:Master

Type:Thesis

Country:China

Candidate:J F Li

Full Text:PDF

GTID:2428330623469153

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Recently,deep learning-based text analysis and understanding has become the backbone technique behind various natural language processing(NLP)applications,including question answering,machine translation,information extraction and text classification.However,despite its tremendous popularity and impressive performance,recent studies have demonstrated that deep neural networks-based NLP models are suffering from the vulnerability to maliciously crafted adversarial inputs,which is highly concerning given its increasing application in many real-world securitysensitive tasks such as sentiment analysis,toxic content detection and text-based antispam.To further investigate the vulnerability of deep neural networks-based NLP models,a plenty of attacks have been proposed to generate adversarial texts from different perspectives.However,most of the existing attacks assume full access to either model architecture,parameters or training data,which often does not hold in many realistic scenarios.In addition,existing works on text adversarial attacks mainly focus on the English NLP domain,and the vulnerabilities of Chinese-based NLP systems are still largely unknown.Furthermore,in the adversarial attack and defense game,the existing defense mechanisms are obviously at a disadvantage,which leaves the NLP models completely exposed to attackers.To bridge this striking gap and further enhance the robustness of NLP models,in this paper,we study the adversarial attacks and defenses against the NLP models from three aspects.Compared to prior work,this paper differs in significant ways:(1)English-based adversarial attack framework against real-world applications.We propose Text Bugger,a general attack framework for generating adversarial texts against stateof-the-art text classification systems under both white-box and blacks settings.The extensive empirical evaluation on 15 industry-leading commercial applications used for sentiment analysis and toxic content detection shows that Text Bugger is:(i)effective – it outperforms state-of-the-art attacks by a significant margin in terms of attack success rate;(ii)evasive – it preserves most of the utility of original benign text,with 94.9% of the generated adversarial text correctly recognized by human readers;and(iii)efficient – it generates adversarial text with computational complexity sub-linear to the text length.(2)Decision-based attack against Chinese-based NLP systems.We extend the text adversarial attacks to the Chinese NLP domain and propose CTBugger,a novel decision-based attack for generating effective adversarial texts against Chinese-based NLP systems.Compared to existing attacks,CTBugger has the following advantages:(i)realistic – it represents the first decision-based adversarial text attack that solely relies on the hard labels predicted by the target model,which is more practical in realistic scenarios;(ii)effective – the systematic evaluation on both offline models and real-world applications demonstrates that CTBugger can deceive multiple classifiers with a high success rate while maximally preserving the utility of original text;and(iii)efficient – it requires less than 6 queries on average for generating successful adversarial text,which outperforms state-of-the-art confidence-based attacks by a significant margin.(3)Adversarial defense based on multimodal embedding and machine translation.To defend against Chinese adversarial texts,we propose Text Shield,a new adversarial defense framework specifically designed for Chinese-based NLP models.Through intensive empirical evaluations on two real-world datasets collected from Chinese online social media,we show that Text Shield is:(i)generic – it can be applied to any Chinese-based NLP models with retraining the models;(ii)effective – it is effective in defending against the obfuscated texts generated in the real-world adversarial scenarios while has little impact on the model performance over benign texts;and(iii)robust – it significantly reduces the attack success rate even under the setting of adaptive attacks.

Keywords/Search Tags:

deep learning, deep neural networks, natural language processing, adversarial attack, adversarial defense

PDF Full Text Request

Related items

1	Research On Multi-Granularity Adversarial Attack And Defense Scheme For Natural Language Understanding
2	Research On Key Techniques Of Text Adversarial Attack For Deep Learning
3	Research On Textual Adversarial Attack Against Deep Learning Model
4	Adversarial Attack And Defense Methods For Text Classification
5	Research On Adversarial Defense Robustness Of Deep Model
6	Research On Adversarial Attack And Defense For Image Classification
7	Research On The Generation Of Adversarial Examples For Defensive Deep Learning
8	Research On Adversarial Examples Attack And Defense Technology In Deep Neural Networks
9	Research On Defense Methods Against Adversarial Attack Based On Deep Supervision And Noise Injection
10	Research On Deep Reinforcement Learning Based Text Adversarial Attack Method