Font Size: a A A

Research On Weighted Bayesian Mail Filtering Method

Posted on:2017-02-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2308330503986806Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
It is convenient for people to communicate with each other by E-mail, while a new trouble comes out due to large number of spam. Spam occupies a large amount of network resources, and has an invasion of privacy, which brings people a lot of interference. So it is of great practical significance and practical value to study how to filter spam. In this paper, the research work is based on the Bayesian spam classification algorithm. The algorithm based on bayesian theorem in statistics can find the posterior probability on the basis of the prior probability, thus the spam email will be picked out from lots of email. Bayesian algorithm has been widely adopted in the filed of E-mail filtering according to text classification.This paper firstly introduces the spam filtering research background, research status at home and abroad and the common filtering methods. In order to establish a standard of spam filtering, the paper introduces the bayesian probability theorem, the corpus used in this paper, and the evaluation index often used in reference literature text classification. Research is mainly focused on the effect of naive bayesian classification algorithm and establish a fingerprint vector method and the CHI_XIG. Then the paper analyzes the superiority of the new method in spam classification, and the simulation experiment proved that the naive bayesian algorithm based on feature expression and feature selection produced significant increases in filtering.The study found that mail head and body have different effect, so the paper established a bayesian spam filtering model that mail head and body have accordingly different weight. In actual use, mail head and body weight come into being from historical data, so using the weighted bayesian mail filtering model to calculate the comprehensive score as email type of quantitative basis. The weighted bayesian spam filtering model have an advantage of classifying spam.
Keywords/Search Tags:bayesian theorem, mail filtering, feature selection, feature expression, weighted
PDF Full Text Request
Related items