Font Size: a A A

Short Text-based E-mail Filtering Technology Research

Posted on:2010-03-21Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhangFull Text:PDF
GTID:2208360275950017Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
E-mail is a primary means in internet telecommunication. However, at the same time,spams (also named as "junk mails"), simultaneously pervade widespread on line, bringing a lot of troubles to numerous users. Therefore, it is important and practical to prevent and control spasm effectively. Anti-spam Filtering System mainly adopts the technology of their own rules,since content-based spam filtering technology is still immature,therefore,the effect of seam filtering is not ideal.In order to filter Chinese spam E-mail effectly,we launch the emotion categorization of anti-spam filtering.Existing content-based spam filtering methods are discussed and the theory of email and transferring email is introduced. With the deep research on theory and application of text emotion categorization algorithm,put forward text emotion analysis to spam filtering resolvent. After email content segmentation, feature selection and feature weight calculation,the emails can be expressed by a vector space. As system knowledge, the Part of Speech Transfer-Form can be obtained by inductive learning of training corpus. When we evaluate a practical Chinese text by utilizing the way based on Pronunciation Matching to extract key words, the evaluation value can be gained according to the system knowledge. This evaluation value is used to examine the comparability of the context dependence regulation between the key word matched in the practical Chinese text on the Internet and the same key word that has been learnt in corpus. If the evaluation value exceeds the preconcerted value, the Chinese information will be shielded. We take this as the foundations, has developed a mail filtration system.Through selecting different characteristic quantity, the different training mail number, the different junk mail threshold value comes to the system to make a whole test. And the recalling rate most high energy achieves 94.8%, the accuracy is highest achieves 96.7%, the mean value surpasses 85%, the effect is satisfying.
Keywords/Search Tags:Spam Filter, Text Emotion Categorization, Learning of Corpus, Part of Speech Transfer-Form
PDF Full Text Request
Related items