| With the rapid development of the Internet, E-mail has become one of the important tools for communication. However, junk mails are flooding and occupying a lot of resource in Internet, they are consuming a lot of bandwidth of information flow and influence the user's work efficiency. Therefore, the problem of spam has become a significant global issue.Currently, people prevent spam by anti-spam legislation and mail filter technologies, although there are a variety e-mail filtering technology, but we believe that filter spam by its semantic or content is the essential way. Rule-based or probability and statistics based methods are the main way to filter spam according to the mail content. Bayesian model is the typical one.In this thesis, according to the characteristics of spam, we carried out a systematic research. Based on Bayesian theory, we proposed an improved Bayesian spam filtering model. This model extract text features by computational linguistics methods---mutual information; then classify mails by improved Bayesian methods, finally we present our experiments and the result. ?... |