Font Size: a A A

Two-stage Reinforcement Naive Bayesian Classifier

Posted on:2024-08-14Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhangFull Text:PDF
GTID:2557307052981629Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
naive Bayes(NB)is a classical machine learning method with excellent classification accuracy and robustness to noise.The classical Bayesian algorithm is based on two main theories: Bayes’ theorem,which states that the posterior probability is based on the prior probability and the joint probability;and the assumption of conditional independence of features,which states that features are assumed to be independent of each other and the joint probability is calculated by multiplying the conditional probabilities of the features together.However,since its strong assumption based on the conditional independence of features is unlikely to hold in practical applications,it has,to a certain extent,limited the expansion of the application of the plain Bayesian model.Scholars have proposed many improvements to plain Bayes in order to weaken this assumption,which can be broadly classified into three directions: optimising the structure of Bayesian networks,selecting or weighting the features and selecting or weighting the samples.Secondly,the conditional probabilities of features are calculated differently depending on the probability distribution of the features.As the plain Bayesian algorithm was originally applied to text classification,and the features in text data are qualitative in nature,the assumptions about the probability distribution of features are mostly single;for quantitative variables in the data,scholars tend to adopt a discretization approach.Then,for the different types of features in the data,this disaggregation of quantitative data can lead to the destruction of data integrity to the extent that the learning of conditional probabilities is not accurate enough.And then,it was argued that the learning of conditional probabilities is not accurate enough in the case of small samples,and a finetuned plain Bayesian approach was proposed in order to increase the reliability of the estimated information.This improved method was proposed based on the application of text classification to achieve improved classifier performance by adjusting conditional probabilities.Therefore,it does not apply to adjusting the conditional probabilities corresponding to quantitative features,and so this paper proposes a finetuning method applied to conditional probabilities based on Gaussian distributions.Based on this,in order to extend the application of the plain Bayesian model and improve its generalization performance,this paper proposes an improvement to the classical NB,the Two-stage Reinforcement naive Bayesian Classifier(TRNB).TRNB).The two-stage Reinforcement Bayesian Classifier is a model based on multiple distributions,assuming a multinomial distribution for qualitative features and a Gaussian distribution for quantitative features.The model is trained in a two-stage training framework: in the first stage,the features are weighted,the features are accurately ranked by mutual information,unselected features are sequentially added to the feature subset and the training set is predicted to obtain the mean squared error of classification,the feature subset with the smallest mean squared error is selected as the optimal feature subset and the feature-weighted classifier is learned;in the second stage,the conditional probabilities are adjusted by In the second stage,the conditional probability is adjusted,and the overall training set is predicted by the feature-weighted classifier learned in the first stage,and the initial conditional probability and the set of misclassified samples are obtained.Finally,the optimal conditional probabilities are obtained and the two-stage augmented Bayesian classifier is learned.The two-stage augmented plain Bayesian classifier achieves the strong assumption of weakening the conditional independence of features by packing weighting the features in the first stage,and makes the learned estimation information more reliable by adjusting the conditional probabilities in the second stage,ultimately achieving an effective improvement in classifier performance.In the experimental part of this thesis,the generalisation of the two-stage augmented plain Bayes classifier is validated on 33 UCI benchmark datasets by comparing classification accuracy and running time.We find that the average classification accuracy of the two-stage augmented plain Bayesian classifier is significantly better than that of the standard plain Bayesian approach,the plain Bayesian approach based on minimising mean square error feature weighting,the plain Bayesian approach based on feature subset selection,the plain Bayesian approach based on relevance feature weighting and the fine-tuned plain Bayesian approach.Meanwhile,we also propose the TRNB model based on relevance feature weighting and the TRNB model based on minimized mean square error feature weighting under the framework of two-stage augmentation,i.e.,two feature weighting methods based on filtered and embedded in the first stage.By comparing the classification accuracy as well as the running time,we found that this two-stage augmented plain based on packed feature weighting in the first stage proposed in this paper Bayesian classifier,which is based on wrapped feature weighting in the first stage,has the most significant improvement in average accuracy.In the third part of the experiment,we go on to explore the performance of all models based on different sample sizes,and we find that the twostage augmented Bayesian classifier performs particularly well for larger sample sizes(>500 samples).Combining the extensive experimental results shows that the model performance of the two-stage augmented plain Bayes classifier is significantly better than all the algorithmic models compared in this thesis.
Keywords/Search Tags:Feature weighting, Fine-tuning, Incremental learning, naive Bayes
PDF Full Text Request
Related items