Font Size: a A A

Study On Bayesian Network Classification Models And Its Application In Credit Scoring

Posted on:2008-04-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:X S LiFull Text:PDF
GTID:1119360215959144Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
As a rising subject, data mining is playing an increasingly important role in the decision support activity of every walk of life. Bayesian network is a knowledge expression and reasoning tool under uncertain conditions, which possesses some predominance than the other data mining tools. Modeling and reasoning with Bayesian networks have many successful application in business intelligence, medicine diagnosing, natural language understanding, fault diagnosing, heuristic searching, image explaining and goal recognizing etc. to discover uncertain relations among many events or attributes. In order to enhance its classification performance, and broaden its scopes, it has been studied the Bayesian network classifiers algorithms and investigated its application in the credit scoring domain.Main creative results are obtained as follows:1. The study of tree augmented naive Bayesian classifier (TAN) and tree augmented naive Bayesian multi-net classifier (TAMN) with mixed attributes.TAN and TAMN often require discretization of continuous variables. It is important to investigate TAN and TAMN allowing Mixed-mode data, for better data distribution representation and information loss avoidance. It is derived the maximum likelihood function of hybrid data, and realized separation of the log likelihood of continuous attributes from discrete ones. Base on Directed Minimal Weightier Spanning Tree algorithm of graph theory, it is proposed Extended Tree Augmented Naive Bayesian, and Tree Augmented Naive Multi-Net classifier algorithm. A new method is introduced which can treat the conditional distribution with the continuous father nodes and a discrete child node. The proposed method can avoid using the soft-thresholding function and neural network to fit this conditional distribution, reduce the load of operation, and increase the classification accuracy. It is realized modelling with continuous attributes by parametric method, and broken through the restriction of continuous variables discretization. The proposed method can work out the problem of hybrid variables in the framework of TAN or TAMN. It is proved by experiments that good accuracy of classification can be achieved by the proposed classifier.2. The study of a flexible augmented naive Bayesian classifierA new flexible augment naive Bayesian classifier algorithm on the minimum description length rule is proposed. This algorithm can match the networks from naive Bayesian classifiers (NB) to tree augment naive classifiers (TAN) adaptively, and maintain the characteristic of the computational simplicity and robustness of TAN. Experiment results show that this algorithm holds a good accuracy of classification by stratification-cross-validation on the sets of UCI.3. The study of a naive Bayesian classifier on discriminant analysisIn order to remedy the limitation of NB which can not extract the between-classes information, a new hybrid classification model is proposed by integrating NB and Discriminant Analysis method. Firstly, Linear Discriminant Analysis (LDA) or Kernel Discriminant Analysis (KDA) is used to seek a projection of best separates data. Secondly, by projecting old samples to the corresponding projection, new samples are acquired. This classifier is trained on new samples with NB algorithm. In this way, Discriminant Analysis and NB algorithm is combined organically together. Compared with NB and TAN by experiments, the proposed classifier has higher classification accuracy in many dataset.4. The study of application domainTo solve credit scoring problem, base on pretreating data, conventional and improved Bayesian network classifiers are investigated on a real credit data set. Compared with neural networks, parametric and non- parametric models, the Bayesian network has more efficiency and practice value as credit scoring models. Two aspects of Bayesian network credit scoring models are studied, as follows:(1) According to Bayesian network classifier models, the classification error rate of Credit Scoring models with different pretreatment method is investigated. Bayesian network credit scoring models are tested by cross-validation with a real data set, and compared with neural network models. Experiment Results show that Bayesian network classifiers are appropriate to solve the credit scoring problem.(2) Combining Minimum Overall Risk rule and Bayesian network classifiers, a new credit scoring model is proposed on the risk classification. They are tested by cross-validation with a real data set according to Minimum Overall Risk rule, and compared with classification results of neural networks and Bayesian networks model with Minimum Probability of Error rule. Results show Bayesian network classifiers on Minimum Overall Risk rule can decrease the risk of credit scoring effectively.
Keywords/Search Tags:data mining, Bayesian network, Bayesian network classifiers, credit scoring
PDF Full Text Request
Related items