Font Size: a A A

Application Of An Improved Naive Bayesian Algorithm In The Identification Of Spam Message User

Posted on:2018-07-05Degree:MasterType:Thesis
Country:ChinaCandidate:Q Y LiFull Text:PDF
GTID:2359330536477762Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
With the popularity of text messages,spam messages have become increasingly serious problems.Spam text messages include both hacking and fraudulent text messages,as well as a variety of illegal advertising messages.According to statistics,china's mobile phone users receive more than eight spam messages average per week,of which 70% are illegal advertising spam messages.Spam messages have caused very serious social harm,causing strong dissatisfaction of a majority of users and wide attention of community.To build a green network and develop business and provide better service for customers,through technical means and management means to identify spam messages is imperative.Spam message user identification is a classification model in the final analysis.In the current classification algorithm,naive bayes classification algorithm is a simple and effective classification algorithm,is one of the important classification algorithms in the field of data mining.It is a probability-based classification method,and is widely used in various fields.Naive bayesian algorithm assumes that the attributes are independent under the given conditions,but very few problems in practical applications can satisfy this hypothesis.This greatly reduces the applicability of the naive bayesian algorithm.In this paper,we construct the attribute filter by filtered attribute selection and the related attribute reduction,selecting the appropriate attribute of spam message modeling,and then improve naive bayes algorithm on the classification threshold to improve the classification accuracy of the model.The main research contents include:(1)Baseing on filtered attribute selection,filter initial attributes of spam message modeling,and select attributes which have significant impact on the target variable.The paper analyzes the influence of modeling attribute selection to classification results,and introduces the principle of filtered attribute selection in detail,writing the code of filted attribute selection.According to this method,take the consumer behavior data of spam messages as an example.(2)Related attributes are reduced.Naive bayesian algorithm requires that the attributes are independent of each other under a given condition,but the spam data modeling set is difficult to meet such strict requirements.In this paper,we introduce the common methods of related attribute measurement,and basing on the first step of the filted attribute selection,we propose a kind of filtering method which make the attribute independent,which can improve the applicability of the naive bayesian algorithm,making the identification of spam messages customer could be modeled using this model.(3)Improve the naive bayesian algorithm on the classification threshold.In traditional naive bayesian algorithm,the sample is assigned to the first class when the probability that the sample is assigned to the first class is greater than the second class.However,in the sample data set,when the number of samples is extremely uneven,it is very easy to misjudge,reducing the accuracy of the model.In this paper,we improve the classification method by the classification threshold,so as to find the most suitable classification threshold basing on the spam messages modeling data and the classification correct rate of the model from the initial 67.1% to 90.7% so that.
Keywords/Search Tags:Filtered attribute selection, Attribute independent, Classification threshold, Naive Bayesian, Spam messages
PDF Full Text Request
Related items