Font Size: a A A

Spam Filtering For Short Messages In Adversarial Environment

Posted on:2016-11-22Degree:MasterType:Thesis
Country:ChinaCandidate:C YangFull Text:PDF
GTID:2308330479993944Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development and popularity of technology, mobile phones and online platforms(such as email, blog, forum, etc.) have become important means of daily communication. However, increasing criminals spread spam message including advertising, pornography, fraud, superstition and so on via such communication tools because of the low cost, which annoys the users seriously. Different from the email, a short message only has a few words and its length usually has an upper limit, e.g. the traditional SMS message is limited to 160 characters. Therefor their text is rife with idioms and abbreviations, which may deteriorate the performance of traditional classifier in short messages spam filtering. There are some studies about improving the ability of classifier to identify SMS spam in the past years. However, spam filtering technique for short messages under adversarial environment where the efficiency of a classifier is downgraded due to the manipulation of samples made by an adversary has not been investigated.Attacker can revise the feature value of a malicious sample so that it could evade the detection of classifier under adversarial environment. For example, they can insert some good words into a spam to disguise as a legitimate one to cheat the classifier. However, for short messages a good word the length of which is short would be preferred in attack since short message has length limitation. In this study, we investigate the good word attack and its counterattack method, i.e. the feature reweighting in short message spam filtering. Considering the length of short message has an upper limit, we proposes a good word attack strategy based on the combination of length and weight of features, which inserts the good words based on the weight values and also the length of words. On the other hand, the feature reweighting method with a new rescaling function based on the combination of length and weight of features is also proposed for short message filtering. The proposed methods are evaluated and analyzed experimentally by using tow real dataset. The results show that the proposed attack and defense method are more efficient than traditional methods for short messages.
Keywords/Search Tags:Short Message, Good Words Attack, Feature Reweighting, Spam Filtering
PDF Full Text Request
Related items