Font Size: a A A

Applying Functional Dependencies To Improve The Performance And Robustness Of Hidden Naive Bayes

Posted on:2015-01-03Degree:MasterType:Thesis
Country:ChinaCandidate:S D TanFull Text:PDF
GTID:2268330428483196Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Bayesian network is based on probability theory, has offered us a nature way todemonstrate the cause and effect information using a graphical model, it has manyadvantages like intuitive、simple、efficient and stable performance, it has become apowerful tool in knowledge discovery and uncertainty reasoning, the research ofBayesian network has become a hotspot in data mining. Bayesian network has manyapplications, Bayesian classifier is one of them. Na ve Bayes (NB) is the simplestBayesian classifier, which is based on the assumption that all the variables areconditional independence. It has good performance in many domains.Tree-Augmented Na ve Bayes (TAN) is an extension model of NB,which inherits thesimplicity of NB,but is more powerful in ability of expression than NB,this modeltry to find the balance between the efficiency and the accuracy, it has obtained verygood results. Hidden Na ve Bayes (HNB) is a new extension model of Na ve Bayes.Instead of restricting one attribute can at most have one parent in TAN, HNB use ahidden parent to combine the influences from all other attributes. HNB can expressthe relationships between variables more precisely. HNB can avoid the structurelearning procedure, and it is more efficient and accurate.Many researchers have noticed the similarities between Bayesian network andthe relational model. Function dependence is an important conception in relationaldatabases theory, it expressed the strong dependent relationships between variables,we can use these relationships to delete the redundant attributes and build Bayesiannetwork. In this paper we use the function dependence in the learning of HNB, givethe algorithm of Simplified Hidden Naive Bayes (SHNB) to improve the performanceand robustness of HNB. First we use the function dependences to remove theredundant attributes,we notice that when the function dependences set has a cycle,the removing procedure will lead to information loss, so we give an algorithm to dealwith the function dependences set, and then remove the redundant attributes. Then inorder to get more precisely express of the relationships between attributes, we selectan attribute sequence to restore the direct of relationships in Bayesian network.At last, in order to test the performance of SHNB, we select nine datasets in UCI database, then apply HNB and SHNB on these datasets. We compare theclassification accuracy and the area under the ROC curve (AUC) to analyze theperformance. We also conduct the experiment on SEER dataset. The results show thatour work has improved the performance of HNB both in classification accuracy andranking performance.
Keywords/Search Tags:Bayesian network, Bayesian classifier, Function dependences, Hidden Na ve Bayes
PDF Full Text Request
Related items