Font Size: a A A

Learning Bayesian Network Structure Basing On Information Theory And Causal Effect

Posted on:2021-01-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y G LongFull Text:PDF
GTID:2428330629952722Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Machine learning,as an important area of artificial intelligence,is a computing science aiming to learn effective and efficient algorithms from obversed data,and such algorithms can be used to predict on future data.Bayesian network,which is a graphical representation of the probability distribution,plays an important role in the development of machine learning.For the high classification accuracy and interpretability,Bayesian network classifiers(BNCs)have been applied in many places,such as text classification,medical diagnosis,financial forecast and etc.There is a increasing body of work presenting plenty of great BNCs with lots of research and validation.Naive Bayes(NB)is the simplest BNC,which assumes that features are independent given the class.Many researchers suggest that relaxing the conditional independence assumption of NB may help improve its classification performance.As there is no conditional dependence between features in NB,it can be regarded as a 0-dependence BNC.Extending NB to 1-dependence BNCs and then to arbitrary k-Dependence BNCs has been one of the improving techniques,for example,Tree-Augmented Naive Bayes(TAN),k-Dependence Bayesian Classifier(KDB)and etc.While NB is extended,conditional dependencies between features are selectively added in the network structure.In the other words,the more interdependencies with high significance are included in BNCs,the higher accuracy classifiers will achieve.Hence,there are many techniques to identify the significance of interdependencies.Improving the performance of BNCs by evaluation function is one of the common optimization methods.The network structure,which can satisfy the evaluation function best,can be regarded as the best one.Such function usually can describe the conditional probability distribution in some sense,for example,likelihood function,mutual information,conditional mutual information and etc.Improving BNCs by evaluation function may make the learning procedure of BNCs more effective.The single-structure BNCs usually need to increase the feature dependence spectrum while representing the complex conditional dependence relationship between features,which may also increase the risk of overfitting.Ensemble BNCs can solve this problem by combining the classification results of many simple sub-classifiers.Because it is simple to build sub-classifiers easily in the ensemble BNCs which can also include a large amount of information,and the techniques of combining sub-classifiers are various and flexible,hence,ensemble BNCs have been one of popular methods to optimize BNCs.This paper proposes to optimize BNCs in two ways: first,by the theoretical analysis of Kullback-Leibler divergence,this paper proves that the difference between BNCs' entropy is also conditional mutual information with different order,which represents conditional dependence relationship between features.Hence,this paper proposes to extend the BNCs by high-order conditional mutual information.Then,the training data set will be divided into many subsets by the class labels,and the information included by each of them will decide the final classification result of BNCs.Second,according to our researching work of log-likelihood and entropy,this paper indicates that maximizing log-likelihood can result in the decreasing uncertainty of models.Hence,this paper proposes to improve the classification performance of BNCs by combining heuristic feature sorting and conditional dependencies analysis by conditional mutual information.To validate the classification performance of our proposed methods,this paper analyzes our algorithms by 0-1 loss,bias and variance.The experimental results of UCI data sets indicate that the classification performance of our algorithms has significant advantage over classical BNCs,which also proves the effectiveness of our proposed methods.
Keywords/Search Tags:Bayesian network classifier, Kullback-Leibler divergence, conditional dependencies between features, conditional mutual information, heuristic feature sorting, conditional dependencies analysis
PDF Full Text Request
Related items