Font Size: a A A

Inert Learning Taxonomy In Spam Filtering

Posted on:2010-11-02Degree:MasterType:Thesis
Country:ChinaCandidate:J WangFull Text:PDF
GTID:2208360302964652Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Lazy Learner Classification is a new type classification which is different than those eager learner methods like Classification by decision tree induction, Bayesian classification, Association-based classification and Classification by back propagation. The eager learner classification builds the model instantly when it accepts a sample set, while lazy learner classification just stores it and does not build a model or classify the sample based on the similitude until it accepts a testing sample. The biggest advantage of lazy learner classification is that, it supports incremental learning naturally and can build a model of complicated decision space which is an ultra polygon.This paper implemented adopting a typical lazy learner classification - k-Nearest Neighbor classifier - into spams filtering after we analyzed the current major spam filtering techniques. As we know, incremental learning is supported by K-Nearest Neighbor classification naturally, it is just to meet the requirements of updating the training sample set in the course of filtering spams. At the same time, the classification accuracy of kNN is also high, just closer to the Bayesian classifier, when the value of k is large enough. In addition, this paper also improved kNN by clustering the data set using ROCK before classifying. And then, we can reduce computational load of the following classifying.In order to verify the feasibility of kNN and compare the performance of different classifiers with different parameters, I designed a kNN classifier based on VB6.0 and a data set named'spam'. The experimental results show that, the accuracy of kNN classifier is good and ROCK clustering can greatly reduce computational load of kNN classification on the premise that ROCK should not reduce kNN's accuracy.
Keywords/Search Tags:spam filtering, classification, lazy learner, ROCK clustering
PDF Full Text Request
Related items