Font Size: a A A

Research And Design Technology Of Chinese Spam Filtering Based On The Minimum Risk

Posted on:2013-09-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q LiFull Text:PDF
GTID:2248330395985080Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the quickly development of Internet, email has become one of the mostpopular communication tools which is quick, cheap and simple. However, the spamis a problem. It not only consumes the cyber source, but also wastes lots of time andenergy of the user. Therefore, solving the spam problem is realistic significance.Existing Bayesian algorithm use Bernoulli model to process the spam feature and uselow level to filter spam mail, lead to low utilization rate of the spam feature andincreases the risk of miscarriage of justice of legitimate mail. This paper uses thepolynomial model to improve the utilization rate of feature item and improve thestandards of judging spam to reduce the risk of miscarriage of justice of legitimatemail.This paper research and compare the related technologies of existing spamfiltering in-depth. Then it discusses the Chinese word segmentation, featureselection, spam filtering algorithm. According to different optimization objectives, itcompares several typical spam filtering algorithms and pointed out the defect ofexisting spam filtering algorithms. In view of exiting Bayesian algorithm facing theproblem of feature utilization rate lowly, the paper use polynomial model to handlemail features. The model compute the probability of the mail text features, thenaccording to the calculated probability distinguish the important mail feature. Inview of existing Bayesian algorithms exists the risk of miscarriage of justice oflegitimate mail, this paper put forward Based on the minimum risk Bayes algorithm.The algorithm improves the judging standard to filter out mail which is highersimilarity with spam mail, in order to miss a portion of mail which is lowersimilarity with spam. The missing mail may contain legitimate mail which is veryimportant to user.Because based on the minimum risk Bayes polynomial algorithm hasadvantages, this paper make a Based on the minimum risk of spam filtering system.The system make Bayes polynomial model calculate the probability of mail featureto distinguish the important feature in Chinese word segmentation and featureextraction, improve the utilization rate of the spam feature. At the same time, thispaper makes the Based on the minimum risk Bayes filter strategy to reduce the riskof miscarriage of justice of legitimate mail. Simulation results show that under certain conditions the polynomial model has advantages on the performance ofBayes algorithm more than Bernoulli. At the same time it reduces the risk ofmiscarriage of justice of legitimate mail.
Keywords/Search Tags:Mail filtering, Feature extraction, Text classification, Bayesianpolynomial model, Risk assessment
PDF Full Text Request
Related items