Font Size: a A A

Research And Implementation Of Anytime Spam Filtering Key Technologies

Posted on:2014-07-07Degree:MasterType:Thesis
Country:ChinaCandidate:H JiaFull Text:PDF
GTID:2268330401966243Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the advent of the information age and the rapid development of science andtechnology, e-mail has become an essential means of communication in the modernpeople’s live and work, however, the rapid development and wide application of theE-mail system let some people see the huge commercial interests, so spam technologyappears. So far because the characteristics of spam seriously have affected the interestsof enterprises and individuals, legal that fights spam behavior is established and spamfiltering becomes one of the hot research field. There are several commonly used spamfiltering techniques: e-mail-header-based spam filtering techniques; rule-based filteringtechniques; statistics-based filtering technology and so on. Due to the diversification ofspam techniques, we must spend a lot of effort to improve the performance of ordinaryspam filtering techniques, and statistics-based filtering technology can be a goodsolution to this problem. However, people’s requirement to the e-mail system has beenimproved, what they want is not only the high interception rate but also the shortprocessing time. However, most of the anti-spam technology needs fixed computationtime. Anytime classification model can be used to solve this problem.This dissertation first introduces the e-mail and spam, including the history and thedangers of spam. Then we analyze the principles and the defects of e-mail and alsoanalyze the advantages and disadvantages of the current anti-spam technology. Afterthat we introduce some related technologies about statistics-based spam filteringtechnology and propose some anti-spam improved methods, and these methods turn outto have better effect. Then we develop a spam filtering system base on these methods.The main results are as follows:1. Based on anytime classification, we improve the spam filtering technology byusing Super-Parent. We use three different ways to pick up Super-Parent, includinginformation gain, CHI and mutual Information, then we process the email with theorderly Super-Parents set we got. The three methods get different effects which werebetter than AAPE’s.2. Developing a spam filtering system. This system combines several spam filtering techniques, firstly we use black/white list and keywords filtering technologies,which can help us handle a part of e-mail quickly. Then we use the second filteringmodule which is designed before to handle the other part. This system can respond tothe needs of users in a timely manner.
Keywords/Search Tags:Anti-spam, Anytime Classification Model, information gain, CHI, mutualInformation
PDF Full Text Request
Related items