| Electronic mail (e-mail) is becoming one of the fastest and most economical ways of communication available. At the same time, the growing problem of junk mail has generated a need for e-mail filtering. Nowadays, anti-spam measures commonly include black or white list technology, manual rules and keyword based content filtering.Another approach is using automated text categorization and information filtering to filter spam. An e-mail filtering system can learn directly from a user's mail set. Such algorithms of text categorization as Na?ve Bayes, kNN, Decision Tree and Boosting can be applied in spam filtering. Naive Bayes is the most popular filtering algorithm. However, the effectiveness of Na?ve Bayes is limited. Others algorithm are more effective but complicated to compute.Because of the widely use of the Bayes filters, spam senders have found some special ways to get their spam mails out from the filter. One of the ordinary ways is inserting white keywords.We present an anti-filtering system that use white keywords insertion method, and did some experiments on it to research the robustness of Bayesian filter. The results show that the performance of Bayesian filter is weak in robustness.Trying to resolve this problem, we propose use a pattern-discovery based bayesian filter. The pattern-discovery module works using TEIRESIAS algorithm, which is a pattern discovering algorithm that can quickly discover unknown patterns that appear two or more times in a large corpus. It capitalizes on the earlier pattern discovery work on problems from computational biology. In 2004, IBM applied the algorithm to Anti-SPAM field and shows an effective result. We present a filtering algorithm that combines the Teiresias and Bayesian. Experiments show that the algorithm can achieve a high rate of identification without deteriorating the robustness.The contents of this paper are as following:1) An overview about the state of the art of the spam filtering.2) An introduction to the rule-based filtering algorithms3) Investigating anti-spam problem from the text categorization perspective, introducing the approaches of feature selection, classifiers and e-mail corpus in this task.4) Presents an anti-filtering system that use white keywords insertion method, and did some experiments on it to research the robustness of bayesian filter.5) Presents a filtering algorithm that combines the Teiresias and Bayesian and tests the ability and robustness of the filter. |