Font Size: a A A

Implementation Of Content-Based Chinese Spam Filter Using BP Neural Net

Posted on:2008-12-23Degree:MasterType:Thesis
Country:ChinaCandidate:P G LiFull Text:PDF
GTID:2178360215996122Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Along with the fast spread of Internet usage, Email service has become an important tool for communication. At the same time, the junk mail or spam is flooding in the mail server and user mail box. The junk mail is up to 50 percent of all user email. And junk mails take up huge Internet resource and user time. Especially some virus-spams, they can disable some normal Internet service, or attack user's computers. So anti-spam technology is very important for ensuring the normal Internet services, especially the email service.At present, the generally used anti-spam technology is content-based spam filtering. This method is based on the content of the email, using text categorization to filter the spams. Most existing content-based filters cannot work well for Chinese email because of some essential different features between Chinese and western languages, such as, there is no obvious tag between words. So some additional works are needed for filtering Chinese email, i.e. Chinese word segmentation.Refering many related literature, taking deep research and analysis of some existing content-based filtering technology; and according to the features of ANN(Artificial Neural Network), such as self-study, self-organization, etc., this paper puts forward using ANN as the text classifier, and using GA(Genetic Algorithm) to optimize the architecture of the ANN. At the same time, using ICTCLAS system to perform Chinese word segmentation, the filter can work well with Chinese email.At first, this paper introduces some foundational knowledge of the email and spam, and analyses some existing anti-spam technology. Then it explains some related theories of content-based email filtering. At last, it designs and implements a ANN content-based Chinese email filter.
Keywords/Search Tags:Spam, Chinese word segmentation, BP Neural Networks, Genetic Algorithm
PDF Full Text Request
Related items