Font Size: a A A

Research And Construct K-dependence Causal Forest Based On One-order Augmented Tree

Posted on:2018-04-15Degree:MasterType:Thesis
Country:ChinaCandidate:F Y CaoFull Text:PDF
GTID:2348330515973957Subject:Engineering
Abstract/Summary:PDF Full Text Request
If the Nineteenth Century can be seen as the era of the industrial revolution,the Twentieth Century can be seen as the era of capitalist war,then,the Twenty-first Century is the era of information technology,which is fully deserved.With the rapid development of computer and Internet technology,information technology has participated in the social life of people in all walks of life,the explosive growth of information and data has brought tremendous opportunities and challenges to all walks of life in the future.Large and unordered data always contain unknown and valuable information,so the processing and application of big data have attracted a large number of computer experts,data mining technology has become a hot issue in the research of information technology in recent years.As utilizing graphical models to express the relationship between attributes,Bayesian network models are more distinct and understandable than other methods.Based on information theory and probability theory,Bayesian network stands out in many data mining algorithms.Bayesian network is a kind of probabilistic causal models,which can be used to deal with uncertain and incomplete problems.The first proposed Bayesian classification algorithm based on Bayesian networks is the Naive Bayes(NB),which has the most strict assumption about attribute independence,also it is the most simple Bayesian classification algorithm.On the basis of Naive Bayes,with the relaxation of the assumption of independence between attributes,a large number of the Bayesian classification algorithms born.Among them,tree augmented Naive Bayes(TAN),AODE,KDB are popular at present.TAN model allows one-order dependence of attributes,AODE has a multi-model structure,and the KDB model allows the high-order dependence of attributes.Whether the classification algorithm only allows one-order dependence expression of attributes,or multi-model structure,or allows high-order dependence expression of attributes,the key to improve the accuracy of Bayesian classification algorithm is how to maximize the most correct dependence relationships among the attributes.This paper presents a new Bayesian network classification algorithm,called k-Dependence Causal Forest,referred to as KCF.It extends the one-order maximum spanning tree into a forest with high-order dependence representation,aimed to contain the most important dependencies between attributes as far as possible.KCF combines the features of the three algorithms TAN,AODE,KDB,its main idea can be describe as follow: First,the KCF model is based on the maximum spanning tree MST.Then,each attribute acts as the root node in turns,the dependency relationship between attributes is provided from the root node.The number of attributes is equal to the number of the sub-models,and each sub-model is a one-order dependence TAN model.Finally,allowing each non-root nodes in each sub-models selecting k-1 parent nodes in the path from itself to the root node.In order to verify the classification accuracy of KCF model,this paper did a large number of comparative experiments.40 data sets from the UCI database are selected to make comparative experiments of five algorithms,and verified the accuracy and stability of the KCF classification algorithm by the three parameters,respectively 0-1 Loss,Bias and Variance.Finally,analyzed the structure of KCF model in detail,verified that KCF model can better the express dependence of attribute from the point of view that the Markov blanket,explained the reason why the KCF model always has higher classification accuracy than others.Furthermore,as the multi-model structure of KCF,each relevant attribute is treated fairly,so KCF model can provide the specific attribute dependence analysis,which is more suitable for medical diagnosis and treatment.
Keywords/Search Tags:Bayesian Network, High-order Dependence, Maximum Spanning Tree, Bayesian Classifier
PDF Full Text Request
Related items