Font Size: a A A

An Intelligent And Integrated Method Of Spam Filtering With Double Engines

Posted on:2008-02-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhouFull Text:PDF
GTID:2178360215490589Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Nowadays, it is the golden periods when internet is violently rushing. Especially, Email appears which brings a great convenience to people and reduces the cost of communication between people. At the same time, a new trouble comes up, that is to say, a large number of junk mails squeeze into people's mailbox which leads to the huge cost of people's time and energies. How to accurately distinguish junk mails from numerous mails and dispose them has become a world-wide issue. This is called as"Anti-spam".In order to solve the harsh question, many measures and methods must be adopted include lawmaking, social organization and technology ways. From the aspect of technology, the fight will persist for a long time between anti-spam and dispatching junk mails as the fight of anti-virus. So, the author has researched many techniques of anti-spam and theories concerned with it such as Bayesian classification model and Centroid-based classification model, combination of multiple classification models.Bayesian classification algorithm based on the theory of statically probability is a classic classification method; it has many obvious merits such as a well-developed theory base, an accurate classification with high-performance. Centroid-based classification algorithm is an innovative classification approach with very high accuracy and high performance; it is based on a vector-space model and is popularly applied in various oriented-text classifications.After these popular algorithms and techniques for spam classification are studied. Iit is found that they have respective different advantages and disadvantages: some are good at Chinese, and others do well in English. By integrating and improving these algorithms, an intelligent method of spam filtering is presented. This method utilizes the advantages of previous algorithms and avoids their shortages. Moreover, it also adds an intelligent mechanism which cans self-study by using the contents of the emails. Finally, it is found that this algorithm do well in the real environments.At last, a spam filtering system based on Bayesian algorithm is designed with VC++ and MySQL. We integrated TDI driver development technology with Bayesian classification algorithm, Centroid-based classification algorithm to implement the filter system. In order to improve the precision of the algorithm for dealing with Chinese e-mails, we introduce the mechanism of Chinese word segmentation. The code of Chinese word segmentation is adopted open source codes of ICTCLAS of Institute of Computing Technology. By testing, the effect of filtering is very good.
Keywords/Search Tags:spam-mail, ham mail, white and black lists, rules, Bayesian filtering
PDF Full Text Request
Related items