Font Size: a A A

Adversarial Classification For Email Spam Filtering

Posted on:2012-01-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:W DengFull Text:PDF
GTID:1488303359459134Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
As an important technology of intelligent information processing, machinelearning is widely used in spam filtering systems. However, in practical adversarialenvironments, spam filters encounter never-ending malicious attacks by spammers. Sothe machine learning algorithms which perform well in experimental environment mayperform badly in practice. Adversarial classification is proposed for this challenge. Nowadversarial classification is a hot topic in machine learning and has great value intheories and practical applications.In this dissertation, researches on adversarial classification problems in spamfiltering have been conducted, which include game problems between attacker anddefender in adversarial classification, combating Chinese good word attacks in spamfiltering, and Kolmogorov complexity based robust classification methods. Fiveinnovative contributions of the dissertation are enumerated as follows.1. A Stackelberg game theoretical model with reaction-time delay is proposed foradversarial classification. Previous researches on Stackelberg game theoretical modelsof adversarial classification could not explain the reason that the spammer continues tolaunch attacks after the Nash equilibrium is reached. In this model, the data miner'sreaction-time delay is considered in Stackelberg game. In addition, the influences ofreaction-time delay to the spammer and data miner are emphatically analyzed. The Nashequilibrium is reached by using genetic algorithm. The model's correctness is verifiedby our experiments. The model shows that the spammer who has the advantage of beingin the lead obtains extra payoffs during the data miner's reaction-time delay. So thespammer can continuously launch new attacks.2. A Stackelberg game theoretical model with uncertainties is proposed foradversarial classification. Existing researches on Stackelberg game model foradversarial classification critically assume the data miner plays optimally and rationally.Unfortunately, it is not real in practical spam filtering. In the proposed model, the dataminer's bounded rationality and limited observation for the spammer's strategy is considered. In addition, the influences of different uncertainty parameters to theclassifier are analyzed with emphasis. At last, the model's effectiveness is verified onreal spam dataset.3. A multiple instance logic regression model for combating Chinese good wordattacks is proposed. Now there is little research on the problem of Chinese good wordattacks. This model uses Chinese word segmentation and feature selection methods forpreprocessing. Then it uses multiple instance learning mechanism and logic regressionalgorithm for learning and classification. At last the experimental results on largeChinese spam corpora show that the model can effectively combat against Chinese goodword attacks. It also shows that the robustness of the model is better than that of singlelogic regression model and single instance support vector machine model.4. A Kolmogorov complexity based spam image classification model is proposed.Traditional classification algorithms for spam image have the vulnerabilities of lessrobustness and strong sensitivity of image features for special image dataset. The modeluses data compression technology and Kolmogorov complexity classificationmechanism to classify spam images effectively. At last, the experimental results onspam image database show the model can accurately classify spam images. In addition,the model's security of updating mechanism is primarily analyzed. The model needsneither text extraction from images, nor feature definition and feature selection ofimages. It is a kind of data-driven parameter-free classification method.5. A Kolmogorov complexity based malware detection framework is proposed.Spam is an effective way to transmit malware. It is hard for traditional signature-basedapproaches to detect malware which is new or obfuscated. A general malware detectionframework is proposed. It uses dynamic Markov compression to classify code instances.The experimental results show the framework can accurately detect malware. Theframework can be implemented easily without malware signature selection and candetect unknown and obfuscated malware effectively.
Keywords/Search Tags:spam filtering, adversarial classification, Stackelberg games, Kolmogorovcomplexity, Chinese good word attacks
PDF Full Text Request
Related items