Font Size: a A A

A Research Of Text Mining-based Spam Filtering

Posted on:2006-06-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y X GongFull Text:PDF
GTID:2168360155470124Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, E-mail has been widely applied. It greatly gives facilities for people's life. However, spam e-mails make a lot of trouble to numerous users, network administrators and ISPs as its byproducts. The issue of spam email in Internet has grown tremendously in the past few years. And this problem attracts many researchers' attention.Generally, spam e-mails are such emails that are forced into mailboxes without the permission of users. To the spam e-mails which adopt multicast technology,etc, the task of anti-e-mails must be carried out with the aid of technological means. Currently, the techniques of anti-spam e-mails mainly include spam e-mail filtrating, the security management of mail server and the research for improving SMTP.Filtrating technology is the main technology of anti-spam e-mails. This text researches mail filtrating based on data mining technology. Through learning the current condition and development trend of mail filtrating technology, the dissertation presents the idea of applying the text classification algorithm to filtrate spam e-mails according to the trait that email can be converted to text. The content of the dissertation include the following aspects: Chapter 1 briefly introduces the harm of spam e-mails and its current condition as well as the technology of anti-spam e-mails. Chapter 2 gives out the basic steps of filtrating mails using text mining. Chapter 3 mainly researches the method of converting semi-structured e-mails to structured text data in mail preprocessing, especially the method of recognizing the potential characters of e-mails, etc. Chapter 4 researches the classification methods and strategies of spam e-mails, and presents a classification method of spam e-mails which combines various classification methods but giving priority to Bayesian classification. Especially, it improves the traditional Bayesian classification method to solve "False positive" problem in the process of filtrating spam e-mails. Chapter 5 constructs a spame-mail filtrating system model based mainly on content. In the end ,the dissertation sums up its content and presents the intending research direction.
Keywords/Search Tags:filtering, text mining, preprocess, classification
PDF Full Text Request
Related items