Font Size: a A A

Research On E-mail Filter Based On Genetic Algorithm And Naive Bayes Classification

Posted on:2008-12-17Degree:MasterType:Thesis
Country:ChinaCandidate:X J LiuFull Text:PDF
GTID:2178360242460769Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, E-mail is becoming the fastest and the most economical means of communication, while the spam also becomes a serious social problem currently. Therefore it is significant to explore an effective spam-filter system. Presently the text classification techniques based on content analysis are gradually applied to mail-filtering techniques and becoming the hot spot of current research, amongst which the Naive Bayes approach is an important method. This dissertation uses genetic algorithm and Naive Bayes classification to work out a Chinese mail-filtering model. Its main work is as follows:(1) To process the word segmentation of Chinese mails with N-shortest-path method, to express the text on the computer with vector space model, and then to extract the features.(2) Based on the principle of Naive Bayes classifier, it designs and realizes a Chinese mail-filtering model based on genetic algorithm and Naive Bayes classifier, and refines the traditional Naive Bayes model with genetic algorithm. It also proposes a GBFT algorithm to calculate the ratio of the three important components, namely, the sender address, the theme and the main content in mail-filtering process, in order to get higher accuracy and completeness of filtering. The experiment proves the better performance of this algorithm.(3) It usually calculates the probability when using the Bayesian classifier to classify the mails, namely, to compare the probability with the threshold value to judge whether it is spam or useful mail. This dissertation proposes a method to spot and confirm the threshold value by using the influence of the threshold over the testing results, and has got a comparatively reasonable threshold value and thus promoted the accuracy of the results.
Keywords/Search Tags:spam, data mining, Bayesian classifier, Chinese E-mail filter, Vector Space Model, genetic algorithm
PDF Full Text Request
Related items