Font Size: a A A

Research On Discriminative Learning Of Tree-structured Bayesian Network Classifiers

Posted on:2012-05-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z F WangFull Text:PDF
GTID:1118330335951393Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Most of intelligent applications, such as automatic recognition, prediction, and diagnosis systems, are often dependent on classification techniques. Study of learning algorithms for automatically building a classifier is one of the principal components from data in data mining and machine learning domains. With the continuous research, the performance of Bayesian network classifiers has been incrementally improved considerably in these years. Recently, some researchers have been recognizing that learning a classifier for enhancing classification accuracy is different from building a Bayesian network. An optimization algorithm for building Bayesian networks is mainly to approximate descriptive ability of variable distributions for data. Instead, an optimization algorithm of Bayesian network classifiers should focus on both its effectiveness and efficiency. There have been existed two different learning strategies, generative learning and discriminative learning strategy. A discriminative learning scheme aims to improve classification accuracy and has a kind of ability to adjust the difference between a network structure and variables distribution, and it is much more suitable for designing an algorithm of learning classifiers. However, a discriminative learning strategy does not have better mathematical properties and its computational complexity is more expensive. Therefore, it is necessary to develop a classification model integrating some mechanisms of discriminative learning strategies to improve classification accuracy as well as to reduce training time. In addition, it is in need of analyzing their application values on real data. For some crucial issues from real applications, the dissertation has studied on tree-structured Bayesian network classifiers (TBNC). The main contributions are as follows.(1) Study results have shown that a discriminative parameter learning algorithm is not good at dealing with structures having redundant edges. Firstly, the relation between Bayesian network structures and the true joint probabilistic distribution of variables is expressed quantitatively and a network structure that is more complex than the truth is defined as the structure with some redundant edges. Secondly, an experiment is designed to verify that there exists the structures with redundant edges in many situations, and this kind of structure would reduce the performance of a Bayesian network classifier. The result also points out that it is necessary to consider a TBNC for classification problem. Moreover, a pre-process RSD (Reducing Structure by Derivatives) algorithm has been proposed, which identifies redundant edges by partial derivatives of log conditional likelihood. Experimental results show that RSD algorithm provides a significant improvement on classification accuracy.(2) Study results have shown that TBNCs with same base structures without considering edge directions are equivalent. Firstly, searching space of TBNC structures and equivalent classes of TBNC structures are analyzed, and that the directions of edges in a TBNC do not associate with classification accuracy is described. Secondly, a learning framework of TBNC is presented, called LFWAR (A Learning Framework of TBNC Without Considering Arc Reversal, LFWAR). At last, experimental results show that LFWAR has no influence to classification accuracy and stability of classifiers statistically, and LFWAR could reduce the training time when it is used in scene classification.(3) Study results have shown that training process of a TBNC structure is robust on a non-i.i.d. data set. Fisher's method for combing p-values is applied to design the structural learning algorithm of a TBNC in order to analyze the robustness of training algorithms. In spite of Fisher's method can improve the performance of a Bayesian network built on a non-i.i.d. data set, it is hardly to improve the classification accuracy of a TBNC experimentally. Hence, the training process of a TBNC structure is robust on a non-i.i.d data set.(4) Study results have shown that discriminative parameter learning methods of a TBNC are sensitive on noisy data, and a smoothed method is need. Firstly, since traditional boosting method used in TBNC learning can merely deal with binary classification, this paper extends it for a new ensembling algorithm of Bayesian networks for multi-class classification. Secondly, a SmoothedBNB algorithm (Smoothed Bayesian Network Boosted Classifier) is proposed to deal with noisy data and unbalanced data by a novel confidence function, which uses the confidence of classifying to bound weights of noisy data. At last, the experimental results show that SmoothedBNB can efficiently improve the performance of classifiers on noisy data and unbalance data.The achievements of this dissertation have demonstrated the difference between a Bayesian network classifier and a Bayesian network on several facets. These results have displayed the value of a TBNC, and have promoted the development of discriminative learning strategy. Moreover, the dissertation has laid some sound theoretical foundation for further real applications.
Keywords/Search Tags:Data mining, Bayesian network classifier, Discriminative learning strategy, Structure learning, Parameter learning
PDF Full Text Request
Related items