Font Size: a A A

Research And Implement On The Chinese Anti-Spam Engine Based On The Automatic Category

Posted on:2007-12-25Degree:MasterType:Thesis
Country:ChinaCandidate:H G ChenFull Text:PDF
GTID:2178360185459877Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Along with the widespread use of Internet, E-mail has become necessary in daily life and an important means of Internet communication. However, spam is becoming a threat against the system security. Thus, the research on anti-spam has become a global significant problem.Among the various techniques to solve spam problems, mail filtering is an effective way, especially one based on automatic text categorization technique, which is flexible and efficient.This dissertation introduces E-mail technology, the definition and the harm of spam, common anti-spam technique and its feature. Considering the feature of Chinese mail, the techniques required in Chinese spam filtering engine was studied, a Chinese anti-spam engine based on automatic text categorization technique was designed and implemented. The dissertation discussed the overall architecture of this engine, the design of pre-manage model, the design of training model, the design of classification model, as well as relative techniques. The implementation methods of all modules were studied herein. For Chinese phrase segmentation, a method of Chinese phrase segmentation based on index was proposed and implemented, which has greater classifying efficiency compared with the traditional mechanical classification. For feature extraction, the method of Mutual Information was used, the disadvantages of traditional method of Mutual Information was analyzed, as well as the improving measures were brought out. For mail presentation, improved upon the traditional Vector Space Model, a presentation method was proposed which is more suitable for the Bayes computing. The Anti Spam Engine was tested by large test data, and the discussion and analysis of the result were given in the last. To strengthen feedback study, a mutual study idea was proposed in this dissertation, the feature server was introduced in too, and the implementation of the feature server and the extension of desktop engine were also introduced.
Keywords/Search Tags:Spam mail, Mail decoding, Chinese Phrase Segmentation, Feature selection, Training study, Bayes classification, Mutual study
PDF Full Text Request
Related items