Font Size: a A A

Reseach And Implementation Of Spam Samples Analysis Technology

Posted on:2011-08-13Degree:MasterType:Thesis
Country:ChinaCandidate:D K ZhangFull Text:PDF
GTID:2248330395457775Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Spam has become one of the troubles for Internet users, and it has affected the daily life seriously, it has wasted the time for the users, and has lowered information security level.Many kind of anti-spam technologies have been studied, and the most important anti-spam technologies include the filtering technology based on rules and filtering technology based on contents. Many spam samples are needed to form the rules and train the classification, so the technology for spam samples analysis has to be studies to capture, de-duplicate, store, and select spam features.Oriented for sample acquirement, a sample block mechanism has been designed based on mail gateway; oriented for sample de-duplication, the de-duplication mechanism has been designed for two stages, including the first stage de-duplication and in-depth de-duplication. A spam sample clustering algorithm has been deigned based on similarity measure. With this algorithm design, a two-level data storage structure has been finalized for mass spam sample storage.The feature selection algorithm based on spam sample analysis has been mainly studied. For the requests of "Recognize most of the spasm accurately with less features", feature selection mechanism based both ant colony optimization algorithm and generic algorithm has been designed. For "Recognized spasm with wider features accurately" hybird optimization of feature selection and kNN classification has been designed.System implementation has been done for the design of the mechanism and algorithm. The spam sample analysis system includes the main parts of data analysis engine and analysis results distribution platform. Test platform has also been designed to test the feasibility, validity, and practicability. The result of the test shows that the sample analysis technology can satisfy the requests of spam comprehensive reporting system, and lower the Miss Rate and False Rate of the spam filtering system.
Keywords/Search Tags:Spam Samples, Samples De-duplication, Similarity Measure, Feature Selection, Optimization Algorithm
PDF Full Text Request
Related items