Font Size: a A A

Content-based E-mail Filtering System Research And Design

Posted on:2008-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:S S ZhengFull Text:PDF
GTID:2208360215950343Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Nowdays, e-mail is the most ecnomical and fastest communication method, while spam has become a difficult problem over internet. Now, the spam e-mails that the users have received are more than the available e-mails and the growing problem of spam e-mails is becoming graver. At the same time, many means have been researched to resolve the problem of spam. Content-based spam filtering is one of the mainstream technologies used so far, Boosting, Bayes, SVM and Winnow are the best means to filter spam. They can achieve very good results on research corpora.According to the problems mentioned above and the research results have existed, This paper introduces the optimal search theory into spam filtering area, through combining the optimal search and KNN to judge the e-mails that may be miscarried by SVM filter to reduce the cost that the available e-mails being miscarried to spam e-mails bring to.The main improved work is: orgnised the cutting-word dictionary over again, added some new words and deleted the one byte and two byte words; made corresponding dealing on the implentation of cutting-word to invoid losing inportant words and reduced the workloads of cutting-word and the number of words. This would reduce many workloads for feature extraction; Improved DB operating to increase the speed of cutting-word; Improved weight caculating to increase the precision of classification.The innovation is: based on the classification effect of KNN and the feature of the optimal search theory that could reach the biggest search probability on the condition of limited resource, constructed a model of optimal search combining KNN, this increased the precision and reduced the miscarried rate of spam filtering.According to the research results mentioned above, this paper constructed a duple spam filtering system based on SVM and optimal search combining KNN.
Keywords/Search Tags:spam filtering, support vector machine, optimal search theory, KNN
PDF Full Text Request
Related items