Font Size: a A A

Filtration System Design And Implementation Based On Bayesian Algorithm Spam

Posted on:2015-01-21Degree:MasterType:Thesis
Country:ChinaCandidate:L L WangFull Text:PDF
GTID:2268330428490979Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Along with explosion of the Internet email has become an important way of everydaycommunication, precisely because email has unparalleled advantages such as transceiverseasy, simple, low cost and so on, so many Internet users as their preferred e-mail Contact.However, along with the development of network messages, We often receive e-mail doesnot know who sent the e-mail or address, This e-mail with a variety of information-basedadvertising, Such as free calls, discounted merchandise, all kinds of illegal information.These messages may be related to your work and life have nothing to do, or that you arevery disgusted, but similar these messages every day,"persistent" enrich your mailbox,disturb your life, sometimes it will bring poisoning causes a computer virus paralyzed. Thisforced their way into the e-mail to the user’s e-mail is called spam (UBE, Unsolicited BulkEmail) or also known as the commercial propaganda messages (Unsolicited CommercialEmail, refers to the promotion of goods as the main content of the message).Given spam to modern society caused great harm to study how to better suppress spam.spam has become increasingly urgent international anti-spam technology has been a hottopic of discussion. In this paper, based on previous research in theoretical and based on thesystematic study of the theory and international spam email filtering methods, the mainfocus of the analysis is Bayesian algorithms for classification of spam. Introduction paperintroduces the development and operation principle e-mail, Introduces several commonlyused e-mail protocols, Such as MIME (Multipurpose Internet Mail Extensions), SMTP(Simple Mail Transfer Protocol). Secondly, it introduces rules to filter spam, there weresender e-mail address, the recipient e-mail address, black and white lists, message subjectand so on. These related to the composition of the anti-spam rules first line of defense.Finally, focus on the content-based spam filter method, Bayesian algorithm based oninsufficient to make some improvements: Chinese word for several methods of obtainingrelevant presentation, There dictionary Chinese word segmentation method, N-grammethod and artificial segmentation, etc. Then create a special thesaurus, a systematic study on the corpus, Use Naive Bayes theory received emails to discriminate, and ultimatelypresented to the user as spam or normal mail.Finally, in the combination of theory and related technologies, this paper presents asimple Bayesian spam classification simulation, a sample study conducted by e-mail spamfiltering, The proportion of spam and legitimate messages reference the "ChineseAnti-Spam Survey Report" spam e-mail accounts for users in the percentage obtained byexperimental data reflects the effectiveness of the method of garbage interception.
Keywords/Search Tags:e-mail protocols, Bayesian filters, Chinese word segmentation, message classifica tion
PDF Full Text Request
Related items