Font Size: a A A

Improvement And Research Of Naive Bayes Classification Algorithm

Posted on:2022-02-21Degree:MasterType:Thesis
Country:ChinaCandidate:J M WangFull Text:PDF
GTID:2480306314970049Subject:Mathematics
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,the classification of massive data has become a very important task.Bayes Classifier is a kind of classification algorithm based on Bayes' theorem,using probability theory and statistical knowledge.Among them,Naive Bayes Classifier(NBC)is the simplest and most widely used classification algorithm in Bayes Classifier.It can be compared with some classic classification algorithms such as decision tree,k-Nearest Neighbor,neural network and so on in many fields.However,because Naive Bayes Classifier introduces the assumption of attribute conditional independence,although this assumption is simple,it is often not true in actual situations,which will affect the classification accuracy of Naive Bayes Classifier.To solve this problem,this paper improves Naive Bayes Classifier from the perspective of attribute features,and the details are as follows:Firstly,aiming at the problem that the assumption of attribute conditional independence of Naive Bayes Classifier is too harsh,this paper uses orthogonal matrix to do orthogonal transformation on continuous attribute features.After the transformation,the influence of linear relationship between attributes is eliminated,and the attribute conditional independence is enhanced,which is close to the assumption of attribute conditional independence of Naive Bayes Classifier,and then based on the assumption that the continuous attributes obey the normal distribution,the class conditional probability of the continuous attribute features is estimated through the probability density function,so as to classify the samples of the unknown category.The improved algorithm is tested and analyzed with typical data sets.The experimental results show that the performance of improved algorithm(INB)is significantly improved compared with NBC and Bayesian Network(BN).Secondly,aiming at the problem of Naive Bayes Classification of discrete attributes,this paper proposes a new method based on Naive Bayes Classifier,the discrete attribute features are marked numerically to achieve the purpose of continuous discrete attributes,so that arithmetic operation can be carried out.Orthogonal transformation is performed on the discrete attribute characteristics after numerical processing.According to the central limit theorem,when the sample size is large,the sampling distribution of the overall parameters tends to be normal distribution.The probability density function is used to estimate the class conditional probability of the attributes of discrete samples.Finally,the test data sets are classified based on the Bayes decision rule of minimizing the classification error rate.The experimental results show that: compared with the standard NBC and BN,the classification accuracy of the numerical processing algorithm is significantly improved,which shows the rationality and effectiveness of the continuous processing of discrete attributes and orthogonal transformation.Finally,aiming at the problem of mixed attribute Naive Bayes Classification,based on the Naive Bayes Classifier,the discrete attributes in the mixed data set are numerically preprocessed,and then the continuous attributes and the numerically processed discrete attributes are orthogonally transformed by orthogonal matrix,which overcomes the strong constraint of the independence assumption of attribute conditions in Naive Bayes Classifier,and then estimates the the class conditional probability by calculating the probability density function of all attributes.Finally,the samples of unknown categories are classified based on the Bayes decision rule of minimizing the classification error rate.Experimental results show that the improved algorithm can improve the accuracy of classification.
Keywords/Search Tags:Naive Bayes, attribute independence, orthogonal transformation, numerical mark, central limit theorem
PDF Full Text Request
Related items