Font Size: a A A

Design And Implement Of The Feedback Learning Spam Filter System Based On K-Nearest Neighbor Model

Posted on:2011-12-06Degree:MasterType:Thesis
Country:ChinaCandidate:H LiangFull Text:PDF
GTID:2178360305489387Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
E-mail technology has become a fast and economical means of modern communication and almost every Internet user has one mailbox. However, e-mail is becoming an important carrier of commercial advertisement, viruses, Trojan horses, and it has brought much inconvenience and harm to the clients on the Internet as a result of its overrun, and caused terribly effect to the security of network, and takes up valuable bandwidth resources and a lot of storage space of mail server. It has many spam filtering methods at present, but the spam is increasing, the aftermath shows that existing spam filtering methods do not achieve the satisfactory effect. Therefore, the research into new and efficient e-mail filtering system still has particularly important practical significance. Most of filtering algorithms are based on content and rules in the research on spam filter, of which the rule-based filtering algorithms require users to chronically customize and maintain rules and they are two value judgment that lack of credibility; Most of the content-based filtering algorithms are based on vector space model, of which Naive Bayes algorithm and K-Nearest Neighbor (KNN) algorithm are widely used. Although the Naive Bayes spam filter is simple and convenient, the recall and precision are hard to be improved. KNN is highly complicated on computation, and can not be used on the occasion of filtering a large number of specimens, and also can not be used on the highly real time application. Two new concepts, legitimate attribute and nonlicet attribute are put forward in this thesis. Furthermore, a new filtering spam method that is based on legitimate attribute and nonlicet attribute, SEAFS, is offered. SEAFS spam filtering algorithm compares the merit of Naive Bayes and KNN model, overcomes the shortcomings of Naive Bayes and KNN model, and turns linear filtering of the general method into non-linear filtering. SEAFS algorithm not only improves the filtering accuracy, but also achieves satisfactory efficiency. SEAFS algorithm can be used on the occasion of filtering a large number of specimens and can be used on the application of highly real time. SEAFS is propitious to filter massive spam online. The content of e-mails is changing over time, and the individual demand of user is also changing, therefore, we add feedback learning process in order to capture these changes and address this issue in the study of spam. We design and implement a practical spam filtering system, and experimentize largely, obtain good filtering effect, and prove the feasibility and effectiveness of SEAFS algorithm in the spam filtering.
Keywords/Search Tags:spam filter, feedback learning, Naive Bayes, KNN
PDF Full Text Request
Related items