Font Size: a A A

Bayesian Network Learning Based On Multivalued Dependency Of Relational Database

Posted on:2014-01-26Degree:MasterType:Thesis
Country:ChinaCandidate:H J XiaFull Text:PDF
GTID:2248330395496755Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The problem of Bayesian Network (BN) learning is one of the key issues of knowledgediscovery in the past years, the structure learning algorithms which have been proposedgenerally fall into two categories, they are searching and scoring based and dependencyanalysis based methods. However, the problem about BN structure learning has not beensolved totally, there are usually a lot of variables in the real life application, and thecomputational complexity will rise exponentially in the number of variables whenever anadditional variable becomes available, so how to reduce the computational complexity of BNbecomes a crucial problem. Many researchers have proposed to utilize expert knowledge inthe procedure of BN learning, e.g., sort the attributes according to expert knowledge, or usethem to select subset of attributes, these methods are proved to be effective. In this paper, wewill take into account data dependencies among attributes in relational database, they are veryvaluable expert knowledge which can be helpful for learning BN.The Bayesian classification model proposed in this paper names the Functionaldependencies and local Multivalued dependencies based Naive Bayes(FM-NB), whichinherits the simplicity of Naive Bayes (NB) and has the advantage of Tree augmentedNB(TAN), i.e., the ability to maintain inter-dependencies among attributes, then it also relaxesthe conditional independence assumption of NB. We will use association rule technology tomine data dependencies implied by dataset in preprocess, which include Functionaldependencies (FDs) and Local Multivalued dependencies (LMVDs), then they can be utilizedto remove extraneous attributes and construct the initial network structures. We analysed theFDs in relational database, presented probabilistic inference rules corresponding toArmstrong’s axioms, and discoved that the attributes in the right hand of FD were redundantfor classification. Then we can mine the FDs in dataset before constructing classifier, usethem remove extraneous attributes, and build classification model based on the remainingattributes, this could reduce the number of attributes which closely related to thecomputational complexity. For the MVDs in database, we proved the relationship betweenMVDs and conditional independence, which sets up a linkage between MVDs and BN. MVD implies conditional independence, but it has strong constraint on the attribute set, andembedded multivalued dependency(EMVD) could be widely used in the real life, then we putforward the concept of LMVD which combines the advantages of them. In order to utilizeFDs and LMVDs in the procedure of BN learning, we discussed the local structurescorresponding to them in different cases, which can be used to build the initial networkstructure, then the whole network could be constructed based on it, this could help to simplifythe procedure of learning BN while still maintaining inter-dependencies among attributes.In order to verify performance of FM-NB model for classification, we implemented thealgorithm, a hybrid discretization method was used to deal with the continuous attributes indataset, which chose proper discretization method for different continuous a ttributes, so theinformation embedded in these attributes can be employed more; the samples which have nullvalue were removed from dataset. We conduct the experiments on9datasets from the UCImachine learning repository, and compare three other classification models on each benchmarkdataset, which are Selective Naive Bayes classifier using Sequential ForwardSelection(SNB-SFS), Tree augmented Naive Bayes classifier using classical floatingsearch(TAN-CFS) and Gain based Association Rule classifier(GARC). Experimental resultsshow that our FM-NB model has better classification accuracy, and more rules can be minedfor FM-NB than GARC when support is the same.
Keywords/Search Tags:Bayesian Network, Functional dependency, Multivalued dependency, Conditionalindependence, Association rule
PDF Full Text Request
Related items