Font Size: a A A

Design And Implementation Of The Email Spam Detection System Based On Naive Bayes And Svm

Posted on:2011-07-11Degree:MasterType:Thesis
Country:ChinaCandidate:Z N QinFull Text:PDF
GTID:2198330338989559Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Without a doubt, email has become one of the most popular and indispensable communication technologies, as it is low cost, has fast delivery and is easy to use, especially on mobile devices. However, some users have abused this technology by sending out various and thousands of spam emails with no purpose.To address these issues, this project designs and implements a feasible email spam filtering system by utilizing some existing open source libraries and email corpus. The system mainly consists of six modules: pre-processing, machine learning, spam detection, evaluation, IO manager and email corpus module.The pre-processing module represents email corpus as feature vectors. After that, the machine learning module generates a model file based on the learning result of these feature vectors. The spam detection module can identify whether a given email message is spam by utilizing this model file. The evaluation module mainly provides the functionality which is required when carry out the system evaluation processes, such as the time, memory, space, accuracy calculation methods and some other performance calculation methods.In order to better guide the design and implementation processes of anti-spam system for mobile devices, this dissertation makes a contrastive analysis between Naive Bayes and different types of SVM algorithms (with different classification solutions and kernel types) in the spam filtering area theoretically and experimentally, what's more, explains which algorithm is more efficient and suitable for mobile devices with limited system resources.Theoretical analysis, experimental comparison and the performance evaluation for the email spam filtering system show that a certain type of SVM classifier with linear kernel is more suitable for mobile devices' email message classification. And the filtering system which was based on this classifier can identify spam e-mail efficiently and accurately.
Keywords/Search Tags:Spam Filtering, Na(?)ve Bayes, Support Vector Machine, Machine Learning
PDF Full Text Request
Related items