Font Size: a A A

Research And Implementation Of Spam Filtering Technology Based On AAPE Classification Model

Posted on:2015-04-02Degree:MasterType:Thesis
Country:ChinaCandidate:Z Z XuFull Text:PDF
GTID:2308330473953328Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Since 1978, when a sales representative of DEC sent an advertisement of new DEC-20 computer to ARPANET users in U.S. West Coast through email, the first junk email was created. Then people’s life was crowded with different junk emails, which caused a huge impact on those Internet users. According to “Chinese Anti-Spam Survey Report of the First Quarter in 2013”, the average number of spam received by Chinese e-mail users weekly was 14.6, occupying the proportion of 37.37% of all messages. This causes much inconvenience, such as waste of time, computer resources and network bandwidth, spread of virus, negative impacts on users’ emotion and economic loss. As the spam problem getting worse, researchers proposed lots of spam filtering technologies to prevent spams from flooding the Internet. Current spam filtering technologies perform well in telling spams from regular emails, but usually require fixed time, which makes it difficult to fulfill users’ real-time requirement. To solve the problem, researchers proposed a classification model named AAPE(Anytime Averaged Probabilistic Estimators).AAPE classification model, presented by Dr. Yang Ying, is an anytime classification model based on Bayesian estimating. The graduation thesis firstly introduces the background and hazards of spam and the working principle of e-mail. Then vulnerabilities of e-mail that can be exploited by spam are pointed out. After that, advantages and disadvantages of AAPE classification model are analyzed, and an improved spam AAPE classification model is proposed. Finally, the improved AAPE classification model is tested and the test results are analyzed to prove the efficiency of the proposed model compared to the original model.The main contributions of this study are list as follows:Firstly, some improvements are introduced into the traditional AAPE classification model. According to correlation degree of characteristic items, three methods including Expectations Cross Entropy, Chi-Square and Mutual Information are used to calculate the strong cross-correlation feature items, which are applied in the spam filtering technology. Secondly, the improved AAPE classification model is analyzed based on experimental results, and is proved to have great improvements in speed and accuracy compared to the original AAPE classification model. Finally, an improved AAPE classification model with a two-layer filtering system is designed. In the first layer all messages are filtered quickly and roughly using a black and white list technology. And in the second layer an intelligent filtration technology based on AAPE classification model is applied for deeper filtration. By doing this, the real-time and accuracy of the system is ensured.
Keywords/Search Tags:Spam Filtering Technology, AAPE Classification Model, Feature Weight Selection, Expectations Cross Entropy, Chi-Square
PDF Full Text Request
Related items