Font Size: a A A

Based On Minimal Risk, Bayesian Multi-level Spam Filtering System

Posted on:2012-12-02Degree:MasterType:Thesis
Country:ChinaCandidate:R LiFull Text:PDF
GTID:2208330332990578Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid popularization of Internet,E-mail plays an important role in people's life.It became one of the popular means of communication in modern society,because its convenience, easy sending,low cost.But the E-mail also brought some negative effects,especially the proliferation of junk mail.Spam seriously takes up system resources,wastes the user's time and threatens to the security of network.Currently,it has become an urgent problem on the Internet,so to design and implement an effective spam filtering model has an important and practical significance. In view of some present existing problems of spam,this paper puts forward the following four solutions:(1) The current spam technologies are mostly concentrated in the field of Machine Learning and Data Mining, but most algorithms can not effectively filter spam. The paper proposes an improved algorithm for the Minimum Risk Bayesian,which combined the Minimum Risk Bayesian algorithm and the AdaBoost algorithm.It is essentially Bayesian algorithm with minimum risk as basic classifier, use AdaBoost algorithm as the training classifiers framework.It is divided by training which often is the wrong kind of training samples, and marked it in order to increase the purpose of classification accuracy.After combining the two algorithms,it improved the classification accuracy and recall ,at the same time it also achieved a good filtering effect.(2) In the course of the experiment,I found a problem:the improved algorithm is not necessarily better than the original one during the filtering progress.To solve this problem,the paper puts forward a shunt spam filtering idea. Shunt filtering is based on mail content The operation method is , first classify them simply,then put these classification consigned to the module with corresponding contents which is good at filtering this content .By this way,it can make better use of algorithms and targeted filtering.(3) For single filtration technology can not effectively filter spam problems, this paper puts forward a kind of multi-level filter spam method. The method is based on the fusion of Black / White list,keywords, rules-based,E-mail content-based and so on. Through it users can filter subject of the message, attachment name text content of keywords, email and attachments text and other information .Multi-level filter can fully exert the advantages of each and every technology and to achieve the ideal filtering effect. (4) This paper is also designed in Microsoft Visual Studio 2005 platform for a multi-level content-based filtering system.The spam database of China Education and Research Network Emergency Response Team (CCERT) provided training and test mail samples. In order to make a test we selected 400 legitimate mails and 200 junk mails from the database and it proved that the filtering idea is effective.
Keywords/Search Tags:Content filtering, the Minimum Risk Bayesian, AdaBoost algorithm, Multi- level filtering, Shunt filter
PDF Full Text Request
Related items