Improvement And Implementation Of Factor Tree Inference Algorithm For Bayesian Networks

Posted on:2009-09-14

Degree:Master

Type:Thesis

Country:China

Candidate:W Dong

Full Text:PDF

GTID:2178360242980332

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Bayesian network was developed in 1980's and has been paid increasing attention to since 1990's. Compared with the early rule-based approaches, it has more clear semantics and usually makes more reasonable conclusions, but involves much more calculation.Generally, Inference in general Bayesian networks is a rather complex issue. So far tens of Inference algorithms have been developed to make Bayesian networks as practical as possible. All the algorithms fall into two categories- exact ones and approximate ones.Here, we only accurate Bayesian network inference algorithm in the discussion and research on, Bayesian inference networks precision in the most widely used algorithm of two classical algorithm: VE and CTP. VE allows one to remove nodes irrelevant to a given query while CTP allows computation sharing among multiple queries. However, their disadvantages are apparent too: VE has to eliminate all the hidden nodes for each single query, this will cause a large amount of redundant computation; when it comes to CTP, the complexity of initialization is very high. In our experiment, we prove that they do hold these properties written above. Still, we find that the effectiveness of VE is strongly influenced by the location of query and evidence nodes as well as the amount of evidence nodes. That is, in a pruned network, provided other factors which may influence the effectiveness of VE are fixed, the closer the query nodes are to the root nodes, the closer the evidence nodes are to the leaf nodes, and the larger amount of evidence nodes there are in the network, the more effective VE is. As to CTP, when there are fewer query nodes in the network, the effectiveness of upward and downward propagation algorithm performed in CTP is relatively low. What is more, when the multiple query nodes do not belong to any cluster in the junction tree, it is impossible for CTP to answer the joint posterior probability directly.There are such Bayesian networks in which the query and evidence nodes are restricted in a subset of the set of the nodes. That is, the set of the nodes in such a Bayesian network is divided into the set of query nodes Q, the set of evidence nodes E, and the set of hidden nodes H. Therefore, it is very natural for us to only care about the posterior probability of several nodes in Q given the observation of several nodes in E in such Bayesian networks. In 2004, Wenhui Liao,Weihong Zhang etc proposed an algorithm called FTI which constructs a factor tree to conserve the intermediate computation in order to simplify the process of upcoming queries. More specifically, each f node in a factor tree reserves a factor, and in each upcoming query FTI uses message passing algorithm (upward message passing algorithm only) to propagate evidences, at the same time eliminates each non-query variable in the factor and computes the product of all the factors where the posterior probability lies. Liao proved that the performance of FTI dealing with multiple queries in such Bayesian networks is the best among the three. However, the picture is not always rosy. In our experiment, we find that the correctness of FTI can not be guaranteed in some circumstances. More specifically, for some Bayesian networks, the result of factor tree construction algorithm performed in FTI is not a factor tree but a factor graph instead which means there is at least one loop. This is absolutely a disaster for FTI because message passing algorithm can only be applied on a tree rather than a graph. When it comes to this, FTI can not achieve the right posterior probability. We modify FTI for some extent, and prove that the algorithm modified does perform better than VE and CTP.In this paper, the efficiency of the algorithm through three experimental analysis, It is natural for VE to hold the lowest complexity (time complexity) for the first query: firstly, VE incorporates evidences before the elimination of non-query nodes therefore the size of factor in the elimination process becomes smaller; secondly, VE does not reserve any intermediate computation. CTP holds the highest complexity, because it has to construct a junction tree. However, it is very hard to compare FTI to determine which holds the lower complexity for the first query. We divide the first query into two steps: the initialization step and the query step. Therefore, the time spending for the first query equals to the summation of the time spending on the initialization and the query step. This paper can be derived through experiments: CTP uses upward and downward message passing algorithm, so it can answer any query (maybe not joint posterior probability) in a linear time after the two propagations. Therefore, the more query nodes there are in the network, the more effective CTP is. FTI reserves intermediate computation which lowers the complexity, however, it has to apply message passing algorithm on a factor graph, which at the same time risen the complexity. In order to compare the complexity of the three, we have to consider the location of the query nodes (which strongly influences VE and partially influences FTI) and the amount of the query nodes (which strongly influences CTP), the location and amount of evidence nodes (which strongly influence VE) as well as the location and amount of hidden nodes (which strongly influence FTI and FR) : when the amount of hidden nodes is relatively large, the complexity of FTI is lower than that of VE and CTP.We use Alarm Network and Asia Network to analyze the complexity of the four. In order to minimize the effect of error, we repeat each experiment for 10 times, then compute the average number. In Alarm Network, we firstly analyze the effectiveness of single query and multiple queries using CTP. Result shows that the effectiveness of multiple queries is higher than that of a single query. Secondly, we analyze the effectiveness of the three.The experimental results show that at least the time required for VE, followed by the CTP, the time required for the most FTI. Because only one network among variables, FTI preserve intermediate results are rather limited, in the follow-up enquiries in the double counting of the savings will be relatively small. But FTI in the need for the application of the factor graph information propagation algorithm, the introduction of additional data structure, increasing the complexity. Thus FTI is in the minimum efficiency in the network. At the same time, we can see that three of CTP in the slope of the smallest, followed by the approximate equivalent VE and FTI. Obviously the smaller the mean slope answer each query the shorter the time, that is to say CTP in the follow-up enquiries, the shortest time, followed by the FTI and VE. In Asia Network, we firstly analyze the effectiveness of the four when the location of hidden nodes differs. Result shows that it has no influence on VE and CTP at all, but it does influence FTI.Secondly, we analyze the effectiveness of the three when the amount of hidden nodes differs. Result shows that it has no influence on VE and CTP too, but it does influence FTI, provided other factors are fixed, the larger amount of hidden nodes there are in the network, the more effective FTI is.Finally, we analyze the effectiveness of the four when the location and amount of the query and evidence nodes differs. Result shows that FTI is affected in a small way, while VE and CTP are strongly affected. The more closer the query nodes are to the root nodes, the more closer the evidence nodes are to the leaf nodes, and the larger amount of evidence nodes there are in the network, the more effective VE is. When it comes to CTP, the larger amount of query nodes there are in the network, the more effective CTP is. To sum up, we claim that when the query and evidence nodes are restricted in a subset of the set of the nodes, compared with VE and CTP, FTI is the most effective exact inference algorithm to deal with multiple queries in such Bayesian networks.

Keywords/Search Tags:

Implementation

PDF Full Text Request

Related items

1	Research On Fast Implementations Of Multiplications In Finite Field And Ring
2	Research On Risk Index System Construction And Control Strategy Of ERP Implementation In Small And Medium-sized Enterprises Based On BPM
3	Implementation Of Modular Multiplication Over Fp
4	Design And Implementation Of Lightweight Block Cipher In The Internet Of Things
5	The Implementation Of FPGA-Based Digital Watermarking Algorithm
6	Research And Implementation Of ARIS-Based ERP Implementation In Graphic Equipment Manufactory
7	Research And Implementation Of Image Rendering And CGM Processing Of Seismic Interpretation System
8	Improvement And Implementation Of Factor Tree Inference Algorithm For Bayesian Networks
9	Research And Implementation Of Parallel Clustering Ensemble Based On Linux Cluster
10	Research And Implementation Of Personal Customer Marketing Management Of Commercial Bank