Font Size: a A A

Study And Implementation On Chinese E-mail Filter System

Posted on:2008-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:L X LiFull Text:PDF
GTID:2178360215451035Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
E-mail has become one of the important means for daily communication. Junk mails, however, lead to not only the wastes in time and resources of the users, but also the wastes in cyber transmission source and server storage space, and threaten security on internet as well. The research on the trouble is done.Currently, some commonly used methods include blacklist and whitelist technique, search via key words and establishing filter rules. In practice, however, these methods mentioned don't work well enough now. The research then focuses on the text sort in terms of content analysis, among which the typical one is Naive Bayes. This thesis, basing on the text sort technique and Bayes theory, proposes the Bayes E-mail filter technique in terms of rough set attribute reduction. This filter technique is advanced on the rough set attribute reduction and elevates the practicability of Naive Bayes on the basis of the dependence on which the attributes are calculated with Bayes classification. Besides, this system also consolidates the E-mail features to improve the filter.This thesis, focusing on Bayes filter technique for Chinese E-mail, introduces the following key techniques and methods:(1) Calculate MD5 of each message, the system then counts the frequency of the same E-mail according to MD5. When the same E-mail appears more than threshold valueβ, it is considered a junk mail;(2) Divide the Chinese characters of the mail via the shortest path N, then express the text in the computer using the improved Vector Space Model;(3) Put forward a reduction calculation in terms of consequence and dependence of rough set attribute concerning the feature selection, on which some influences are brought up on the prerequisites of losing no original information considering the dependence between condition and decision-making attributes, and also among the decision-making attributes as well so as to acquire the least attribute reduction;(4) As for Bayes sort technique, Naive Bayes calculation supposes the independence between the features. As a matter of fact, they do have some relations in a specific message. When violating this independence supposition, the Bayes sorter manifests a high roughness. This thesis, taking the dependence into consideration, lifts restrictions on the variable independence to make it more applicable in scope. Furthermore, it also succeeds in making practical experimentation analysis.
Keywords/Search Tags:Chinese E-mail filter, Vector space model, Rough set, Attribute reduction, Bayeian classifying method
PDF Full Text Request
Related items