| Recently,deep learning-based text analysis and understanding has become the backbone technique behind various natural language processing(NLP)applications,including question answering,machine translation,information extraction and text classification.However,despite its tremendous popularity and impressive performance,recent studies have demonstrated that deep neural networks-based NLP models are suffering from the vulnerability to maliciously crafted adversarial inputs,which is highly concerning given its increasing application in many real-world securitysensitive tasks such as sentiment analysis,toxic content detection and text-based antispam.To further investigate the vulnerability of deep neural networks-based NLP models,a plenty of attacks have been proposed to generate adversarial texts from different perspectives.However,most of the existing attacks assume full access to either model architecture,parameters or training data,which often does not hold in many realistic scenarios.In addition,existing works on text adversarial attacks mainly focus on the English NLP domain,and the vulnerabilities of Chinese-based NLP systems are still largely unknown.Furthermore,in the adversarial attack and defense game,the existing defense mechanisms are obviously at a disadvantage,which leaves the NLP models completely exposed to attackers.To bridge this striking gap and further enhance the robustness of NLP models,in this paper,we study the adversarial attacks and defenses against the NLP models from three aspects.Compared to prior work,this paper differs in significant ways:(1)English-based adversarial attack framework against real-world applications.We propose Text Bugger,a general attack framework for generating adversarial texts against stateof-the-art text classification systems under both white-box and blacks settings.The extensive empirical evaluation on 15 industry-leading commercial applications used for sentiment analysis and toxic content detection shows that Text Bugger is:(i)effective – it outperforms state-of-the-art attacks by a significant margin in terms of attack success rate;(ii)evasive – it preserves most of the utility of original benign text,with 94.9% of the generated adversarial text correctly recognized by human readers;and(iii)efficient – it generates adversarial text with computational complexity sub-linear to the text length.(2)Decision-based attack against Chinese-based NLP systems.We extend the text adversarial attacks to the Chinese NLP domain and propose CTBugger,a novel decision-based attack for generating effective adversarial texts against Chinese-based NLP systems.Compared to existing attacks,CTBugger has the following advantages:(i)realistic – it represents the first decision-based adversarial text attack that solely relies on the hard labels predicted by the target model,which is more practical in realistic scenarios;(ii)effective – the systematic evaluation on both offline models and real-world applications demonstrates that CTBugger can deceive multiple classifiers with a high success rate while maximally preserving the utility of original text;and(iii)efficient – it requires less than 6 queries on average for generating successful adversarial text,which outperforms state-of-the-art confidence-based attacks by a significant margin.(3)Adversarial defense based on multimodal embedding and machine translation.To defend against Chinese adversarial texts,we propose Text Shield,a new adversarial defense framework specifically designed for Chinese-based NLP models.Through intensive empirical evaluations on two real-world datasets collected from Chinese online social media,we show that Text Shield is:(i)generic – it can be applied to any Chinese-based NLP models with retraining the models;(ii)effective – it is effective in defending against the obfuscated texts generated in the real-world adversarial scenarios while has little impact on the model performance over benign texts;and(iii)robust – it significantly reduces the attack success rate even under the setting of adaptive attacks. |