Font Size: a A A

A comparison of email filtering techniques

Posted on:2006-09-22Degree:M.C.ScType:Thesis
University:Dalhousie University (Canada)Candidate:Liang, LianluoFull Text:PDF
GTID:2458390005996537Subject:Computer Science
Abstract/Summary:
Up until now, the first and most popular approaches to email filtering use Naive Bayesian algorithms (NB), because they are easy to implement and have low computation costs. "A serious issue is the problem of false positives" [2]. In this research, we develop extensions of current Case-Based Reasoning technique (CBR) and the centroid-based approach to challenge the current NB approach. As a lazy learner, CBR outperforms NB with advantages, such as handling incomplete and noisy data at consultation time, less training, and incremental learning. CBR can react dynamically and the consultation is more flexible and the system can be customized by users. Finally, CBR allows for the sharing, reuse and retaining of cases. At the same time, sharing of labeling emails as spam is possible. CBR can track concept drift of spam with high classification accuracies. Another lazy learner is the centroid-based technique. This study found that this centroid-based technique is suitable for the mail filtering application with 78% correctness and CBR with 86% correctness, compared with NB with 66% correctness based on a simulated dataset. Another experiments result also illustrated the centroid-based technique with 61% correctness, CBR with 54% correctness, and our NB with 59% over another dataset. Both CBR and the centroid-based technique can be successfully implemented as an anti-spam filtering plug-in with an intelligent web mail server or email client.
Keywords/Search Tags:Filtering, Email, Technique, CBR
Related items