Font Size: a A A

Study On The Application Of Causative Attacks In Spam Filtering

Posted on:2022-08-17Degree:MasterType:Thesis
Country:ChinaCandidate:N ChengFull Text:PDF
GTID:2518306323998509Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the vigorous development of information technology and the increasing popularity of networking,e-mail plays an important role in people’s daily communication.On the one hand,e-mail is simple to use and has high immediacy,which promotes information interaction and communication between people.On the other hand,with the increase in e-mail usage,there is also the emergence of a large number of spam,which seriously affects people’s normal work and even causes certain economic losses.However,as machine learning has received widespread attention in recent years,these technologies have been successfully applied to spam filtering systems and have achieved good filtering results.In the adversarial environment,spammers have designed a variety of attack strategies to prevent spam from being detected by filters in response to the weaknesses of the machine learning algorithm itself.Causative attacks is a type of attack that destroys the data in the training phase.It generally destroys the probability distribution of the original training sample by tampering with the features or labels of the training sample,resulting in a lower classification accuracy of the learned model,and then reduced the spam filtering system’s detection effect on spam.Aiming at the machine learning classification algorithm used in the spam filtering system,based on the distribution of data in the public spam datasets(Spambase,TREC2006 c and TREC 2007),this paper designs two novel label flipping attacks from different perspectives.In addition,for the label noise in the data,this paper designs a label noise detection framework to defend against label flipping attacks.Its core component is the semi-supervised learning label noise detection algorithm based on Ada Boost(Ada SSL).The main study content of the thesis is summarized as follows.(1)Two novel label flipping attack algorithms are proposed: label flipping attack based on entropy method and label flipping attack based on k-medoids.For three different types of spam datasets in the field of spam filtering,a variety of machine learning algorithms are first used to classify emails;Then the classification performance of the machine learning model under the label flipping attack is evaluated as the label flip ratio increases;(2)A label noise detection framework is proposed to defend against label flipping attacks.For the label noise in the data,the Ada Boost algorithm is first used to label the suspicious noise data,and then the semi-supervised learning algorithm can be used to classify the unlabeled data to relabel the data labels.Finally,five real UCI datasets are used to verify the effectiveness of the detection algorithm.And based on the spam dataset,the effectiveness of the detection framework against label flipping attacks in the spam filtering field is verified.
Keywords/Search Tags:Spam filtering, Adversarial environment, Machine learning classifiers, Label flipping attacks, Label noise defense
PDF Full Text Request
Related items