Font Size: a A A

Research On Spam Filtering System Based On Na(?)ve Bayes Algorithm

Posted on:2008-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:X N WeiFull Text:PDF
GTID:2178360245464315Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the popular and development of the Internet, E-mail has played an important role in our common lives .It attracts lots of people for its feather of swift ,convenience and low cost .Therefore it has been taken as one of the most important communication tools .However the end-users and network managers are feeling more and more boring because of the spam appear unlimited .The receivers'time and band-width and resources stored are occupied ineffectively, net linking road are stopped up and harmful messages are sent at anytime and anywhere .It is a urgent problem need to be solved that how to resist spam effectively all over the world .Although the present anti-spam technology can do the filter and interact at every stages however the spam makers take the new method continuously, so it is difficult for the filter system to filter spam .It is a important reality meaning topic to design anti-spam system.The thesis provided the filter plan combining with more kinds of filter technology aiming at the delivering E-mails in different stages and ways. A suit of many levels anti-spam system has been established. Design for the filter spam from two main aspects .The first step is to do obvious filter of the outer trait, do with the first rank filter mails according to the information-related from black-white name list and address of senders and receivers and the delivering ways and the subjects of the mails .If the ruling filter has no valid affection on telling the mails then step the next second rank to further filter. In the process of this filter rank, using the ways of information increasing to select the feather words and establish the feather base, then applying Na?ve Bayes algorithm improved to classify the mails. During the classification, the two kinds of bias degree should be taken in order to configure the two-dimensional space and meanwhile mapping the text to a dot in the two-dimensional and considering the classify algorithm to search a cutting straight line in the two-dimensional space according to the distance between the dot and the cutting straight line to classify the mails. In order to realize the filter much better, finding the mistaking classification mails which among the results from the second rank filter classification to study again and adjust feather base to test once more times .After studying, testing again and again then classifying the mails accurately.The experience ways of N crisscross verify was taken in the thesis, using the collecting mails as ideas materials and applying Na?ve Bayes algorithm and calculating the former-testing probability and the group terms condition of the character terms according to the training assemble .On the basic of this to provide a much more effective way to gain experience data for telling mail how to be classified lumped and providing the experience results with the standard of accurate and all-finding probability than other single mails filter.
Keywords/Search Tags:Spam filtering, Na(?)ve Bayes, Classify text, Feature extraction
PDF Full Text Request
Related items