Font Size: a A A

Adversarial Sample Detection Method For BERT Model Based On Sample Sensitivity Characteristics

Posted on:2023-11-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y H WangFull Text:PDF
GTID:2568306788995129Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of deep learning technology in the field of natural language processing,new language models represented by the BERT model have been widely used.However,studies have shown that even high-performance language models are vulnerable to adversarial attacks.The text adversarial attack generates adversarial samples by slightly modifying the characters or words in the original sample sentence,so that the language model makes an error in the emotional judgment of the sentence,posing a threat to the security of the language processing system.At present,the research on text adversarial attacks against the BERT model is gradually increasing,but the research on defense against this type of attack is relatively rare.When adversarial samples attack BERT models with different parameter scales,the output distribution is significantly different from that of normal samples.Therefore,this thesis uses such difference phenomenon to propose an adversarial sample detection method to detect whether the current sample is an adversarial sample for the BERT model.The representative feature sensitivity indicators are extracted by generalizing the output distribution performance of the samples on the heterogeneous BERT model group.Use the Deep Word Bug,PWWS,and GAN to generate and screen high-quality adversarial sample sets.With the help of the training set composed of feature indicators,the adversarial sample classification detector based on SVM is trained.The adversarial sample detection algorithm is designed on the basis of the classification detector,and the system is constructed,so as to effectively detect the sentiment of the BERT model in the sentence.The purpose of this system is to mitigate the impact of adversarial examples on sentiment classification tasks.The experiment verifies the differential performance of normal samples and adversarial samples on the heterogeneous BERT model group on the SST dataset,and also verifies that the adversarial sample detection method based on the BERT model sample sensitivity feature design has a relatively good detection effect and can effectively Defense against text adversarial attacks against BERT models.
Keywords/Search Tags:adversarial detection, adversarial samples, BERT model, sensitivity feature
PDF Full Text Request
Related items