Font Size: a A A

The Study And Design Of High-order Bayesian Network Classifier Model Based On Information Theory

Posted on:2018-08-07Degree:MasterType:Thesis
Country:ChinaCandidate:X L JinFull Text:PDF
GTID:2348330515973964Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the continuous development of big data technology,the data on the Internet are increasing explosively;abundant information is produced at every moment.How to extract and get useful information from the big data like ocean and use it to work?life and other aspects has become a key issue.Therefore,the data mining technology comes into being.Bayesian network classifier is a commonly used algorithm model for data mining and large data analysis.It bases on the idea of graph theory in discrete mathematics,and uses the directed acyclic graph to show the causal relationship between attributes.The mathematical theory of Bayesian network classifier is the theory of Bayesian in the theory of probability.However,the Bayesian network problem has been proved to be a NP-Hard problem,so it is very difficult to design an excellent Bayesian network classifier.Because unrestricted Bayesian network classifier,which its dependencies between attributes may be very complex,will greatly increase the computational load of the computer,and will produce a phenomenon named "over fitting",affect the classification accuracy.Therefore,the researches of Bayesian network classifiers propose a new restricted Bayesian network classifier,which its training samples are divided into attributes vectors and decision categories.Traditional restricted Bayesian network classifier includes NB classifier,TAN classifier and KDB classifier.NB classifier is based on the assumption of class conditional independence,and its Bayesian network classification model is the most simple and classification is the most stable,but practical application can't satisfy the class conditional independence condition.The TAN classifier uses the maximum spanning tree algorithm to construct the Bayesian network model,which has 1-order optimality,but the model structure is still too simple to ensure good accuracy when dealing with large data sets.KDB classifier is the most flexible,can expanse freely,but if blindly increased the limit value K of one attribute node relies on other attributes node(another name: order value),will cause the network model is too complex,due to "over fitting" phenomenon lead to greatly reduce the classification accuracy.These three kinds of restricted Bayesian network classifier,which their order values K are fixed values.So this paper based on KDB classifier,improves a new constructive algorithm for Bayesian network classifier named KCB(K-Changing Bayesian)classifier,which the order value K is gradually reducing in the construction process,not only prevents Bayesian network classification become too complex,results in "over fitting" phenomenon,but also takes into account the accuracy and reliability of the classification,so the construction of high order Bayesian network classifier becomes possible.This paper will illustrate the construction process of KCB classifier,and point out the two improved methods based on KCB classifier: changing the arrangement of the attribute nodes when pretreat or using the extra value to control the decline of the order value K.Classifier based on the second improved method named KCBB classifier.This paper will use data set selected from the UCI database to test KCB classifier?its derivative classifiers and other classic restricted Bayesian network classifiers,do a lot of testing,and use three parameters: 0-1loss,Bias,Variance to measure advantages and disadvantages of different Bayesian network classifiers,use W/D/L table reflect the experimental results to verify the KCB classifier which this paper proposes the new classifier can simplify the the Bayesian network model generated to some extent,while maintain the accuracy of classification,and when 3-KCB classifier and 4-KCB classifier classify the large dataset,the classification accuracies can be improved,highlight its superiority.
Keywords/Search Tags:Data Mining, Information Theory, Classification, Hgher-order Bayesian Network, KDB
PDF Full Text Request
Related items