Study On The Application Of Causative Attacks In Spam Filtering

Posted on:2022-08-17

Degree:Master

Type:Thesis

Country:China

Candidate:N Cheng

Full Text:PDF

GTID:2518306323998509

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the vigorous development of information technology and the increasing popularity of networking,e-mail plays an important role in people’s daily communication.On the one hand,e-mail is simple to use and has high immediacy,which promotes information interaction and communication between people.On the other hand,with the increase in e-mail usage,there is also the emergence of a large number of spam,which seriously affects people’s normal work and even causes certain economic losses.However,as machine learning has received widespread attention in recent years,these technologies have been successfully applied to spam filtering systems and have achieved good filtering results.In the adversarial environment,spammers have designed a variety of attack strategies to prevent spam from being detected by filters in response to the weaknesses of the machine learning algorithm itself.Causative attacks is a type of attack that destroys the data in the training phase.It generally destroys the probability distribution of the original training sample by tampering with the features or labels of the training sample,resulting in a lower classification accuracy of the learned model,and then reduced the spam filtering system’s detection effect on spam.Aiming at the machine learning classification algorithm used in the spam filtering system,based on the distribution of data in the public spam datasets(Spambase,TREC2006 c and TREC 2007),this paper designs two novel label flipping attacks from different perspectives.In addition,for the label noise in the data,this paper designs a label noise detection framework to defend against label flipping attacks.Its core component is the semi-supervised learning label noise detection algorithm based on Ada Boost(Ada SSL).The main study content of the thesis is summarized as follows.(1)Two novel label flipping attack algorithms are proposed: label flipping attack based on entropy method and label flipping attack based on k-medoids.For three different types of spam datasets in the field of spam filtering,a variety of machine learning algorithms are first used to classify emails;Then the classification performance of the machine learning model under the label flipping attack is evaluated as the label flip ratio increases;(2)A label noise detection framework is proposed to defend against label flipping attacks.For the label noise in the data,the Ada Boost algorithm is first used to label the suspicious noise data,and then the semi-supervised learning algorithm can be used to classify the unlabeled data to relabel the data labels.Finally,five real UCI datasets are used to verify the effectiveness of the detection algorithm.And based on the spam dataset,the effectiveness of the detection framework against label flipping attacks in the spam filtering field is verified.

Keywords/Search Tags:

Spam filtering, Adversarial environment, Machine learning classifiers, Label flipping attacks, Label noise defense

PDF Full Text Request

Related items

1	Research On Attack And Defense Methods Based On Federated Learning Network In IoT Scenarios
2	Research On Label Noise Based On Ensemble Learning
3	Research On Label Noise Filtering Algorithm Based On Federated Learning
4	Researches On The Estimation And Filtering Methods Of Numerical Label Noise
5	Label Noise Filtering Method Based On Confidence Distribution
6	Research On Label Noise Filtering Learning Algorithm Based On Multi-granularity
7	Research On Noise Defense Methods To Deal With Adversarial Attacks On Deep Neural Network
8	The Research Of Machine Learning Methods Based On Label Distribution Learning
9	Research On Noisy Label Based Machine Learning Methods Through Exploiting Crowdworker Feature
10	Imbalanced Multi-label Learning Algorithm Based On Density Label Space