Research On Feature Space Backdoor Attack Methods For Natural Language Processing Models

Posted on:2024-07-26

Degree:Master

Type:Thesis

Country:China

Candidate:X Lu

Full Text:PDF

GTID:2568307100495194

Subject:Cyberspace security

Abstract/Summary:

PDF Full Text Request

With the rapid development of deep learning,deep learning is continuously being applied in the field of natural language processing(NLP)and deployed in the real world.But deep learning-based NLP models are facing the threat of backdoor attacks.The feature space backdoor attacks use specific sentence patterns as the backdoor triggers,and the generated poisoning samples are more natural and fluent.However,the trigger pattern of the current feature space backdoor attacks is single,and the fluency and semantics of the generated poisoned samples are insufficiently preserved.Aiming at these problems,two feature space backdoor attack methods are proposed in this paper,namely,multi-style transfer-based backdoor attack and paraphrase-based backdoor attack.The main research work and results include:1.A backdoor attack method called multi-style transfer-based backdoor attack is proposed.This paper finds through experiments that current feature space backdoor attacks rely on language models that generate poisoned samples.Therefore,in view of the single trigger pattern of the current feature space backdoor attack,this paper uses multiple text styles as the backdoor trigger,which improves the diversity and concealment of the backdoor trigger.Experimental results show that this backdoor attack can achieve good attack performance and resistance to backdoor defenses,and the poisoning samples it generates are fluent and natural.2.A backdoor attack method called paraphrase-based backdoor attack is proposed.Aiming at the problem that the current feature space backdoor attack methods loses part of the fluency and semantic preservation of poisoned samples in order to improve the accuracy of generating sentences with specific patterns,this paper uses the same features of the sentences generated by the text paraphrase model as the backdoor trigger to improve the quality of poisoned samples.At the same time,to improve the classification performance of the attacked model on clean samples,during the backdoor attack process,the clean samples corresponding to the poisoned samples are added back to the backdoor training set.Experimental results show that this backdoor attack can achieve good attack performance and resistance to backdoor defense,and more importantly,the fluency and semantic preservation of the poisoned samples it generates are higher.Main contributions: Using multiple text styles as backdoor triggers,and using the style transfer model to implement a backdoor attack method called multi-style transfer-based backdoor.Using the same feature of the sentences generated by the text paraphrase model as the backdoor trigger,and using the text paraphrase model to implement a backdoor attack method called paraphrase-based backdoor attack.

Keywords/Search Tags:

deep learning, natural language processing, backdoor attack, style transfer, text paraphrase

PDF Full Text Request

Related items

1	Backdoor Attacks And Defenses On Deep Neural Networks
2	Research On Text Style Transfer Based On Delete-Retrieve-Generate Framework
3	Research On Backdoor Attack Method On Natural Trigger
4	Research On Text Style Transfer Based On Seq2seq Framework
5	Research On Paraphrase Processing Methods Based On Neural Networks
6	Research On Fine-grained Chinese Paraphrase Extraction Technology Based On Deep Learning
7	Research On Key Techniques Of Text Adversarial Attack For Deep Learning
8	Text Style Analysis For We Chat Articles
9	Research On Paraphrase Identification Method Based On Deep Semantic Understanding
10	Intelligent Device Text Classification Method Based On Natural Language Processing