Application Of An Improved Naive Bayesian Algorithm In The Identification Of Spam Message User

Posted on:2018-07-05

Degree:Master

Type:Thesis

Country:China

Candidate:Q Y Li

Full Text:PDF

GTID:2359330536477762

Subject:Probability theory and mathematical statistics

Abstract/Summary:

PDF Full Text Request

With the popularity of text messages,spam messages have become increasingly serious problems.Spam text messages include both hacking and fraudulent text messages,as well as a variety of illegal advertising messages.According to statistics,china's mobile phone users receive more than eight spam messages average per week,of which 70% are illegal advertising spam messages.Spam messages have caused very serious social harm,causing strong dissatisfaction of a majority of users and wide attention of community.To build a green network and develop business and provide better service for customers,through technical means and management means to identify spam messages is imperative.Spam message user identification is a classification model in the final analysis.In the current classification algorithm,naive bayes classification algorithm is a simple and effective classification algorithm,is one of the important classification algorithms in the field of data mining.It is a probability-based classification method,and is widely used in various fields.Naive bayesian algorithm assumes that the attributes are independent under the given conditions,but very few problems in practical applications can satisfy this hypothesis.This greatly reduces the applicability of the naive bayesian algorithm.In this paper,we construct the attribute filter by filtered attribute selection and the related attribute reduction,selecting the appropriate attribute of spam message modeling,and then improve naive bayes algorithm on the classification threshold to improve the classification accuracy of the model.The main research contents include:(1)Baseing on filtered attribute selection,filter initial attributes of spam message modeling,and select attributes which have significant impact on the target variable.The paper analyzes the influence of modeling attribute selection to classification results,and introduces the principle of filtered attribute selection in detail,writing the code of filted attribute selection.According to this method,take the consumer behavior data of spam messages as an example.(2)Related attributes are reduced.Naive bayesian algorithm requires that the attributes are independent of each other under a given condition,but the spam data modeling set is difficult to meet such strict requirements.In this paper,we introduce the common methods of related attribute measurement,and basing on the first step of the filted attribute selection,we propose a kind of filtering method which make the attribute independent,which can improve the applicability of the naive bayesian algorithm,making the identification of spam messages customer could be modeled using this model.(3)Improve the naive bayesian algorithm on the classification threshold.In traditional naive bayesian algorithm,the sample is assigned to the first class when the probability that the sample is assigned to the first class is greater than the second class.However,in the sample data set,when the number of samples is extremely uneven,it is very easy to misjudge,reducing the accuracy of the model.In this paper,we improve the classification method by the classification threshold,so as to find the most suitable classification threshold basing on the spam messages modeling data and the classification correct rate of the model from the initial 67.1% to 90.7% so that.

Keywords/Search Tags:

Filtered attribute selection, Attribute independent, Classification threshold, Naive Bayesian, Spam messages

PDF Full Text Request

Related items

1	Research On Classification Methods And Its Application In Customer Recognition Of Banking
2	Analysis On How To Deal With Spam Messages
3	Study On The Selection Of Risky Assets Based On Attribute Reduction Theory
4	Research On Improving Naive Bayes Classifiers And Its Application
5	Detection Research Of Financial Irregularity Of Chinese Listed Company Based On Naive Bayes Classification Algorithm
6	Comparative Study On Multi-attribute Decisions&Multi-attribute Auctions Of Logistics Selection Of Suppliers
7	Research On Multiple Attribute Decision Making Methods Considering Equilibrium Attribute
8	Study On The Positioning And Trend Of Gold In The Changing Global Economy System
9	Crowdsourcing Task Assignment Method Based On The Attribute Space Of Pairwise Associations And Attribute Selection
10	Method For Multiple Attribute Two-sided Matching Decision-making Considering The Attribute Aspirations