Font Size: a A A

Optimization Of Feature Selection Based On Genetic Algorithm In Bayesian Classification

Posted on:2020-01-17Degree:MasterType:Thesis
Country:ChinaCandidate:H WangFull Text:PDF
GTID:2518306104495804Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of technology,the dimension of data in real life has gradually increased,the amount of data has increased dramatically,and the application scenarios of data classification have become wider and wider.Therefore,how to effectively reduce redundant features to achieve data dimensionality reduction and improve classification accuracy has become one of the current research hotspots.Naive Bayes algorithm requires that the features of the data set are independent of each other,so the classification effect is not ideal in complex data sets.It is a researchable direction to optimize the feature selection algorithm from the perspective of combining feature selection and Naive Bayes algorithm to improve the classification performance of Naive Bayes classifier in complex data sets.Aiming at the advantages and disadvantages of filter model and wrapper model in feature selection and the limitation of Naive Bayes assumption,a hybrid model algorithm-Naive Bayes Optimization Algorithm based on information gain and genetic algorithm was proposed.The algorithm is divided into two parts.First,the feature gain method based on the filter model is used for initial feature selection,and a simplified feature subset is selected.Then,the fitness function of the genetic algorithm is combined with the classification accuracy of the Naive Bayes classifier.The optimal feature subset is selected by the global search capability of the genetic algorithm,and an optimized Bayesian classifier is constructed based on the feature subset.To solve the problem of premature convergence of the genetic algorithm,an adaptive crossover rate and Variance calculation methods and optimal individual retention strategies.In subsequent experiments,different data sets were selected,and the proposed algorithm was compared with a variety of algorithms to analyze the classification performance and feature selection effect of the algorithm,the stability and convergence of the algorithm,and the relationship between the similarity threshold and algorithm performance in the algorithm.The experiments show that the proposed optimization algorithm has obvious feature selection effect,and the classification accuracy is significantly improved compared with the Naive Bayes classification algorithm.The discretization processing method used by the algorithm is relatively simple,which has a certain impact on the classification performance,and can be further optimized in the future.The information gain threshold in the algorithm is selected according to experience,and the adaptive threshold method can be studied later.
Keywords/Search Tags:Genetic algorithm, Naive Bayes, Information gain, Feature selection
PDF Full Text Request
Related items