Font Size: a A A

Research On Spam Filtering In Adversarial Environments

Posted on:2016-06-20Degree:MasterType:Thesis
Country:ChinaCandidate:G X WanFull Text:PDF
GTID:2308330479493944Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development and the accelerated expansion of the Internet, email has become one of the most common used manners of information communication in everyday life. It makes our lives and work more conveniently. However the increasing spam has disturbed the normal communication and caused huge losses to the social economy. It’s becoming important to prevent the spread of spam effectively. Thanks to the development of artificial intelligence, varieties of machine learning approaches have been deployed on spam filtering. And it achieved good effects.Yet in adversarial environments, spammers analysis the shortages of classifiers and disguise their spam using a variety of strategies to reduce the accuracy of spam filters. Researches on classification in adversarial environments are called adversarial learning. Without losing readability, evasion attack which is one of the most popular attack strategies used by spammers hide spam’s sensitive features to entice spam filter and reduce its classification efficiency by good word insertion and bad word deletion.This paper systematically analyzed the history and developments of spam, then summed up some related researches of spam filtering under adversarial attacks. Traditional TFIDF method use term frequency to represent the weights of features. But the weights of bad words drop too large to reduce the accuracy of classifiers under the good word attacks. So an improved SRTFIDF feature representation method is proposed to reduce the impact on the weights of features. Experimental results indicate that the robustness of enhanced feature representation method is better than traditional TFIDF method under the good word attacks.Compared with single classifier system, multiple classifier systems improve the classification accuracy and robustness of spam filters. However some researches show that traditional multiple classifier systems perform too badly in adversarial environments. So we proposed a partitioned multiple classifier system based on multiple instance learning in this paper. We split the whole feature space into two instances equally and treat several base classifiers for each instance to improve the robustness of spam filters. The proposed method are evaluated and analyzed experimentally on CEAS 2008 spam dataset. Finally experiments show that partitioned multiple classifier system performs better than traditional multiple classifier systems on the classification accuracy and robustness under the good word attacks and evasion attacks.
Keywords/Search Tags:Spam Filtering, Adversarial Environment, Feature Representation, Multiple Classifier System, Robustness
PDF Full Text Request
Related items