Font Size: a A A

The Study Of Second-Dependence TAN Model Based On Attribute Weighting

Posted on:2017-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:J J WangFull Text:PDF
GTID:2308330482492279Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Bayesian network is a kind of network belongs to the category of probability network, widely used in data mining for solving the uncertainty problem. Each node of the classification model makes specific problem expresses as random variables, and the relationship between the nodes takes on a simple graphic that conditional probability access whether the relationship is strong or weak, and nodes which have direct parent-child relationships connect with directed arcs that parent node refers to the child node, then through the analysis of the model structure and classification formula we can compute and reason the final result. People can judge and reason from the incomplete, uncertain knowledge or information, what’s more, it has many successful uses in the field of medical diagnosis, business intelligence, statistical decision, target recognition and so on.In numerous Bayesian classification models, NB(Naive Bayesian Classification Model) is the simplest in structure, but it has significant limitations because of the strong condition independence among attributes. Experts propose a lot of improved models based on NB, TAN(Tree Extension of Naive Bayesian Classification Model) is one of them. The attribute nodes of NB are completely independent, but the classification information cannot be completely unrelated in many practical problems, considering this, experts increase the first-dependence dependencies among the attributes in the model of TAN. To evaluate the classification results of TAN, on the one hand, shows that it is necessary to consider the correlation among attributes, on the other hand, we can know that TAN’s performance is not ideal in large data sets. After studying the existing Bayesian models and learning related statics, this paper puts forward the improved model for TAN, those are H-TAN and WH-TAN, through two step improvements, significantly improve the classification performance of TAN. Firstly, each attribute node in the TAN model is related to another one attribute node other than the class node,which are assumed to be independent of other attributes, so TAN may only partially express the partial dependencies among attributes. At this point, H-TAN increased a hidden parent node for each attribute node, the effect of the parent node said for the comprehensive effect of other attributes, giving full consideration to the correlation among the nodes. Secondly, in actual, the correlation degree among attributes can be strong or weak, in order to make the final classification prediction results more accurate, joining the weights when probability calculation, combining with the knowledge of the information theory, the value of weights measuring by mutual information, this is WH-TAN model.The new model improves on TAN in two aspects, to verify the effectiveness of the algorithm, the part of experiment tests 45 different sizes of UCI data sets, using 0-1 loss, bias and variance as the evaluation index, analyzing and comparing the different classification results of the algorithms. Through observing the experiment data, we can conclude that the performance of WH-TAN model is good, so it is an effective Bayesian model.
Keywords/Search Tags:Bayesian Network, Classification, TAN, H-TAN, WH-TAN
PDF Full Text Request
Related items