Font Size: a A A

The Study And Application Of Spam Filtering System Based On Rough Set

Posted on:2009-11-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2178360278971148Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the widespread use of e-mail on the Internet, spam has became an increasingly serious problem. It is not only consume the network bandwidth and the time-space overhead of computer but also disturb the enterprise's normal running and the user's normal work. To solve the problem of spam, we must use the comprehensive means, such as legal and technology, etc.Now at home and abroad the technology of spam filtering includes black and white list technology, keyword based on content filtering and the technology of content-based spam filtering and so on. The paper gives a detailed introduction to the research status of the content-based spam filtering technology,which is a mainstream technology for solving the spam filtering problem at present, and its two research directions is respectively the rule-based content analysis approach and the statistic-based content analysis approach.For rough set theory ,without any prior information , can deduced a problem's decision rules through attribute reduction on the premise of maintaining the classification capability, so we introduce rough set theory to the field of spam filtering on the basis of the technology of content-based spam filtering,which is a new research direction for filtering spam.Firstly,this paper studied and analyzed the classical attribute reduction algorithm based on rough set theory and further proposed an improved algorithm of attribute algorithm based on rough set. The experiment showed the improved algorithm is feasible and efficient, especially suitable for the large data sets.Secondly, the system model of spam filtering based on rough set and it's work process are described in detail in this paper; In the problem of feature selection of the spam filtering, we adopted the improved algorithm to reduce the redundant and irrelevant features and integrated the mail head's seven Characteristics with the mail body's Characteristics to identify an E-mail for improving the filtering system's accurate rate and decrease the spam's error rate.Finally, three groups contrasting experients showed that the spam filtering based on rough set is feasible and effencient.
Keywords/Search Tags:E-mail, Rough Set, Attribute reduction, Spam filtering
PDF Full Text Request
Related items