Font Size: a A A

The Research And Implement On The Chinese Anti-Spam Filtering System Based On Advanced Winnow Algorithm

Posted on:2009-11-03Degree:MasterType:Thesis
Country:ChinaCandidate:L H GuFull Text:PDF
GTID:2178360272477152Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Along with the widespread use of Internet, E-mail has become an important means of internet communication in daily life. However, as the carriers of commercial advertisements, virus programs and some sensitive contents, spam is becoming a threat against the system security and causes a lot of inconvenience to people's life. Thus, anti-spam has become a significant problem to the global.In this dissertation, we discussed filtering techniques of Anti-Spam based on content. Combined with characteristic of Chinese spam, a Chinese Anti-Spam engine has been designed and realized, which is based on automatic text categorization technique. This engine is composed of four sections, including pretreatment, training, categorization and feedback.On the aspect of pretreatment, mail decoding, Chinese phrase segmentation, feature selection and mail vector presentation have been discussed. In the phase of Chinese phrase segmentation and feature extraction, Chinese lexical analysis system (ICTCLAS) developed by Institute of Computing Technology and the method of mutual information have been used respectively.Training and categorization are the focuses of this dissertation. Firstly, exponential form of Balanced Winnow algorithm is deducted through the relation between exponential form and factorial form of Basic Winnow algorithm. Secondly, whereas jittery action of Basic Winnow algorithm, an advanced Anti-Spam filtering algorithm named Review Winnow was brought out. The new algorithm effectively reduces the jittery action. What's more, its loss function can more veritably describe the inherent loss because of the error classification. Thirdly, through taking out the outliers in mail set and using the advanced Boosting algorithm, the ADOR-Winnow mail classifier was constructed, whose performance has been improved greatly. Finally, experiments show that Balanced R-Winnow algorithm effectively reduces the jittery action, and ADOR-Winnow mail classifier has excellent performance.With respect to feedback, a model of feedback study based on grid was presented. Through client classification, this model divides feedback level into three other then two as before. They are system level, domain level and client level. This amelioration not only avails coordinated filtration between groups and concentrative feedback study, but also profits the filtrating performance of mail classifier.
Keywords/Search Tags:Spam Mail, Mail decoding, Chinese Phrase, Segmentation, Feature Selection, Winnow Algorithm, Lost Function, AdaBoost Algorithm, Outlier, Feedback Study
PDF Full Text Request
Related items