Font Size: a A A

System Design And Research, Based On The Attribute On The Method Of Spam Filtering

Posted on:2006-02-27Degree:MasterType:Thesis
Country:ChinaCandidate:T Y GanFull Text:PDF
GTID:2208360182956378Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With rapid development of the Internet, on-line electronic information is booming, e-mail becomes the fastest and the most economical form of communication available. Unfortunately, the outbreak of large quantity of spam mails also creates lots of inconvenience to users. They not only waste user time, but also quickly fill-up mail server storage space. As a result of this growing problem, automated methods for filtering such spam from legitimate E-mail are becoming necessary.Methods of spam filtering can be classified in two categories. The first group restricts spam mails with the help of "Black and White List" and self-defined rules. This solution is problematic at best. First, those systems assume that end-users are clever enough to be able to hand-build a rule set. Moreover, as the nature of spam mails changes over time, this rule set has to be updated constantly by users. In addition, this solution tends to make mistakes while detecting mails. The other group of methods learns directly from data in user's mail repository. Methods of this type take lots of time to be trained. Currently, many filters are constructed in combination of methods in both two categories, such as N-tier Filter. For example, basic layer consists of white list, white words, black list, white words, etc. while advanced filter layer adopts machine-learning algorithms based on e-mail content.In this paper, a brand-new filtering model is presented. Combined with "white and black list filtering" and a self-defined rule, our model uses Attribute Theory as filtering algorithm. This is the first time that we apply Attritube Theroy into e-mail filtering field, and from the point of theoretical view, e-mail feature recognition can be considered as intricate property judgement based on conjunction, and Qualitative Mapping regarding interval array as Qualitative Criterion can be explained as a qualitative judgement operarion decided by multidimensional attributeds. Therefore, we can use Qualitative Mapping regarding interval array as Qualitative Criterionto filter e-mails.In order to improve filtering efficiency, feature space is organized in the form of index structure, and feature repositories of spam and legitimate mails are created. According to the Attribute Theroy, we introduce the weight of {0,ε} to indicate the significance of each vector in the feature space. Then, we use the weighted feature vector as the qualitative criterion to establish different qualitative mapping models for different mails-to-be-tested. By considering the degree of a new e-mail belonging to spam or legitimate mail, conversional degree function is introduced too. If a new feature in the tested mail is discovered in the repository of spam or legitimate features, the conversional degree function will be used to evaluate the relationship between the feature weight in the tested mail and that in the feature repository. Finally, the tested mail is judged by the add-up score.Empirical results show that our filter achieves good performance in precision rate and recall rate. It also indicates that the application of Attribute Theory into e-mail filtering is original and feasible, which has paved the way for further e-mail filtering researches.Gan Tangyi (Computer Science) Directed by Prof. Feng Jiali...
Keywords/Search Tags:Spam E-mail Filtering, Attribute Theory, Qualitative Mapping, Conversion Degree Function
PDF Full Text Request
Related items