Font Size: a A A

Application Of Improved Naive Bayesalgorithm In Spam Filtering

Posted on:2021-12-05Degree:MasterType:Thesis
Country:ChinaCandidate:S Q WangFull Text:PDF
GTID:2518306194492684Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of the Internet,e-mail is becoming more and more popular.However,the reliability and security of e-mail have attracted people's attention,because all kinds of junk e-mail,Diaoyu Island e-mail and Sao Ran e-mail have greatly affected people's lives.At the same time,according to the statistics of our country's non-spam reporting and accepting center,more than half of the users have wasted a lot of time and resources because of spam,and half of the users have suffered certain economic losses because of spam.At present,the spam filtering technologies recognized by people include: identity authentication,behavior pattern recognition,white list and keyword filtering technologies,etc.At the same time,the misjudgment of non-spam will cause certain economic losses to spam.Many people are unwilling to turn o n the filtering function of spam,and spam itself also carries various viruses,which more or less brings some confusion to people.With the development of science and technology,Naive Bayes classification algorithm has become the most popular technology now.It is precisely because of the good mail classification effect of traditional Naive Bayes classification algorithm that it has attracted the attention of many researchers.However,due to the limitations of Naive Bayes,in order to improve the accuracy of spam classification,this paper proposes a combination of active learning and improved Naive Bayes algorithm.The main research work of this paper is as follows:(1)If the sample itself is wrongly divided,then the continuous updating and iteration in the training process will cause the error to accumulate continuously,and then a classifier that is easy to be misclassified.Therefore,this paper adopts K-Nearest Weighted Naive Bayes(K-LWNB)combined with active learning,that is,manually select some of the most valuable samples for labeling,so as to improve the accuracy of the samples themselves To reduce the error rate of the classifier.Among them,the K-nearest neighbor weighted Naive Bayes algorithm for spam classification can make the classification effect of spam mails more effective than the traditional Naive Bayes algorithm,and can improve the classification accuracy and precision ofmail more effectively.(2)In this paper,ham(normal mail)and spam(spam)are used as data sample sets.At the same time,the text content is analyzed into word vectors,stop words is removed,and key feature words are extracted.Then,the strips are checked and the correctness of analysis is ensured.At the same time,the conditional probabilities of different independent feature keywords are calculated,and then whether it is spam or not is judged.
Keywords/Search Tags:Spam classification, naive Bayes algorithm, active learning, feature keywords
PDF Full Text Request
Related items