| With the high speed of expansion of the information network, e-mail as a quick and convenient way to communicate, has seen ever widening acceptance from people's daily life. But in the meanwhile, the tremendous overspreading of the junk mails has imposed serious threat and negative impact on the globe, which aroused the world's attention. Although an effective and all around solution to it should involve collaborations from legislation, administration and setting up better specifications, for now, the most practical way remains to be anti-spam technology.This paper aims at lowering the false-positive rate of the anti-spam systems and to better adapt them to the abundant types and variations of the junk mails. To achieve this, the research focuses on the content-based filtering technology, including Bayesian Statistics, Distributed Checksum Clearinghouse and heuristic analysis and detection. Through the adoption of these method, an anti-spam system is constructed. During the research, the filtering interface of MTA layer and MDA layer are improved, the user feedback mechanism is facilitated, personalized control is supported, all these contribute to leveraging the defense capability of our system as a whole. Moreover, from the perspective of real deployments, the anti-spam system is verified against two different mail systems, and their pros and cons are compared.Lastly, this paper investigates into the sets of evaluative indicators for assessing an anti-spam system, by using k-fold cross validation to test against widely collected mail samples. From the result, we conclude that this system is effective in improving the anti-spam Content-based filtering technology. |