Font Size: a A A

The Research On Collaborative Spam Filtering Technology

Posted on:2010-03-26Degree:MasterType:Thesis
Country:ChinaCandidate:L L PanFull Text:PDF
GTID:2178360302459749Subject:Information security
Abstract/Summary:PDF Full Text Request
With the development of Internet communication technology, email has beenwidely used and given us many facilities. But at the same time spam becomes a veryserious problem. Spammers harvest email addresses from web pages and USENETposts, and then deliver bulk spam to those email addresses. Spam causes an enor-mous waste of network bandwidth and gets people into trouble. So there is a need forresearch on anti-spam technology nowadays.There are many types of anti-spam methods such as origin-based filtering, heuris-tics rules-based filtering, machine learning-based filtering, collaborative filtering, etc.In the scheme of collaborative filtering, users are clustered into different groups. Eachuser shares spam information with others in the same group and makes spam decisionwith the sharing information. Compared with other spam filtering methods, collabora-tive filtering can make good use of multiple types of spam information which is knownby the users in the network, get accustomed to the variation of spam feature rapidly,identify new types of spam more quickly, and have great advantages of making correctspam decision. So we focus on collaborative filtering in this dissertation, which is animportant part of the anti-spam research.There is a need for collaborative filtering method to build efficient informationsharing network which can provide quick spam information searching and localization.And in order to reduce the communication cost of information sharing, the networkdistance between two users in the same group should be decreased. The dissertationproposes a new collaborative spam filter system based on hierarchical structured P2P,which is called CSFHSP. CSFHSP builds spam information sharing network based onhierarchical structured P2P and make node ID well correlative with email address. InCSFHSP, Filter uses spam feature sets for spam filtering. Different spam feature setsare identified with tags and shared by users who are in the same group. Each userselects other users who appear to have similar individual interest for collaboration.The evaluation results show that compared with voting-based collaborative filtering method, CSFHSP increases the clustering accuracy of user groups and improves systemscalability.In order to select appropriate sharing users for collaborative filtering and increasethe sharing accuracy of spam information, this dissertation proposes an interest similarity-based collaborative rules sharing method, which selects sharing users based on interestsimilarity and shares spam filtering rules sets according to rule tag. We evaluated rulessharing method by comparing its false decision rate against the mail MD5 values shar-ing method. Our results show that rules sharing method has lower false-positive rateand higher filtering accuracy. Furthermore, because of selecting fewer users for spaminformation sharing, rules sharing method reduces system communication cost.
Keywords/Search Tags:Spam, Collaborative Filtering, P2P, Interest Similarity, Rules Sharing
PDF Full Text Request
Related items