Font Size: a A A

Research And Implementation Of Dependency Analysis Based Bayesian Network Structure Learning And Classifier

Posted on:2006-10-22Degree:MasterType:Thesis
Country:ChinaCandidate:J H GuanFull Text:PDF
GTID:2168360155452957Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data Mining's objective is to learn knowledge from the empirical and to fit the new environment. More and more researchers and application developers pay attention to it. It is applied in many sciences, such as computation, engineering, mathematics, physics, neural science. Now we can find many kinds of learning technology and method, for example decision tree, decision rules, neural network, statistical learning and probability graph model. Depending on each of method's characteristic, some theory frameworks have been set up. They are computational learning theory, bayesian learning theory, classical statistical learning theory and minimum description length. The Bayesian network is powerful knowledge representation and reasoning under conditions of uncertainty. It has two segment: DAG(directed acyclic graph), CPT(conditional probability table). In recent years, many Bayesian network structure learning algorithms have been developed. These algorithms general fall into two groups: search & scoring, dependency analysis. Bayesian network learning includes two parts: structure learning and parameter learning. Because the parameter can be confirmed by network structure and data set, Bayesian network structure learning is the central problem. Many tasks –including fault diagnosis, pattern recognition and forecasting –can be viewed as classification, as each requires identifying the class labels for instances, each typically described by a set of features (attributes). Learning accurate classifiers from pre-classified data is a very active research topic in machine learning and data mining. In the past two decades, many algorithms have been developed for example: decision-tree and neural-network classifiers. While Bayesian networks(BNs) (Pearl 1988) are powerful tools for knowledge representation and inference under conditions of uncertainty, they were not considered as classifiers until the discovery that Na?ve-Bayes, a very simple kind of BNs that assumes the attributes are independent given the classification node, are surprisingly effective. As a important method to discover knowledge, Bayesian network classifier is one of the most important problems. This paper mainly talks about how to use Bayes learning theory to discover the pattern from data. This paper can be divided into two segments: one is the research and implementation of CI tests based BNs structure learning, the other is the research and experiment of BNs classifier. Firstly, we introduce some basic problems of Bayesian network structure learning and an algorithm-DABNL based on conditional independence tests. We analyze the complexity and correction of DABNL in theory. In this paper, we also do experiment research under 3 underlying Bayesian network model. Through the experimental results, we discovered that we can not find the I-map from data by using DABNL, nevertheless DABNL can work well on many really fields. The running time is roughly linear to the number of cases in the data set. This algorithm is capable of handling much larger data sets since the running time won't increase too fast. This Bayesian network structure learning algorithm based on CI test especially fits sparse network. Secondly, We empirically evaluate algorithms for learning four types of Bayesian network(BN) classifier –Naive-Bayes, tree augmented Naive-Bayes, BN augmented Naive-Bayes and general BNs, where the latter two are learned using two variants of a conditional-independence(CI) based BN-learning algorithm DABNL. We implemented these learners to test their effectiveness. We also argue some factors that influence classifier's accuracy, such as the thresholdε, trainning sample size and preferences. From the experimental results, we find that the general trend is that the larger the data sets are the fewer the prediction errors and that for some data sets this algorithm cannot get the best accuracy with fixed threshold. The analysis of our experimental results suggests two ways to improve the unrestricted BN classifier: automatic threshold selection based on the prediction accuracy, wrapping the BAN and NB together and returning the winner. We demonstrate empirically that this new algorithm does work as expected. Collectively, these results argue that BN classifiers deserve more attention in machine learning and data mining communities. In a word, in this paper, we argue the Bayesian network structure algorithm based on conditional-independence test. We use automatic threshold selection in Bayesian network classifier BAN and GBN structure learning. At the same time, we compare the prediction accuracy between every two classifier. From the...
Keywords/Search Tags:mutual information, BNs structure learning, conditional independence(CI) test, classifier
PDF Full Text Request
Related items