Font Size: a A A

An Algorithm Of Parameter Learning In Bayesian Network From Incomplete Data

Posted on:2004-06-25Degree:MasterType:Thesis
Country:ChinaCandidate:H DongFull Text:PDF
GTID:2168360122460359Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Along with the Information Technology development, technique in Data Mining has been applied to reality more and more. Bayesian networks, as a strong model that can represent knowledge and deal with data under uncertainty, is a powerful tool to handle out the data in Data Mining. There exists a large of data in realistic world, therefore how to handle these datasets to discover the knowledge from it is a problem that need to solve urgently.Bayesian Networks is a model that efficiently represents knowledge and probabilistic inference and is a popular graphics decision-making analysis tool. In recent years, the people directly study how to learn Bayesian Networks from data and begin to apply it to Data Mining. Although Data Mining technology is still placed in continuously perfect, it has obtained achievement that make person focusing attention in some data modeling problem.There are two problems in Bayesian Networks: Learning and Inference. In the real world, not exact data exist here and there, how to learn the parameters and structure of Bayesian Networks from data is of practical value greatly. Learning parameter from incomplete data accurately is very difficult in fact. The algorithms exist deal with this kind of problem use approximate approach, these algorithms need many loops and replace so the efficient is not high and occupancy much system resource. In this thesis we proposed an algorithm call BCL based in consistency. It can apply to learn parameters of Bayesian Networks from incomplete data.The new algorithm is based in the property of tendency and normal distribution of consistent Bayesian learning. In Hu Zhenyu's postgraduate thesis there is the following result : if regularity conditions exist, and , the post probability that , namely tends to in probability as . (Here is parameter ).This result tells us that when sample data observed tend to infinite, the parameter learned by Bayesian method tends to normal distribution. The property of parameter's distribution is fixed on so we can estimate the value of parameter by it.Considering this algorithm is used in incomplete data, so repairing the datasetinfluence the exactitude of the result. We should deal with this problem cautiously. We will use Bayesian Heuristics Approach and try to add the influence of prior massage to the procedure of repairing the dataset. We do that as follow: firstly we estimate the parameter using the complete part of dataset. Then we use the formula:=to complete the dataset.This algorithm consists of two keys:(1)how to recover the dataset,(2).how to estimate the parameters. This thesis analyses these two sides, and proposed a feasible algorithm-BCL ,BCL algorithm is made up of the following steps: Step1:take out the opposite and complete data from the dataset, making use of the formula 2.11 and Bayesian Heuristics Approach to estimates a parameter of possibility vector value, is that making use of directly the partial data to compute the the parameter obedient to normal distribution. Step2: under the situation of recognizing tacitly the original parameter, repair the rest incomplete data use the formula, preparing to estimate of all parameter vector most fit. Step3:ï¼›Make use of the complete data to estimate the end value with the matrix method At the stage of experiment, we pass to two classics Bayesian Network Asia network and Alarm( this two networks is a shell that medical treatment ascend the already successful application in the expert system ).We use Algorithm BCL and two kinds of calculate ways: Gibbs Sampling and EM to learn the parameters of two Bayesian Networks respectively. In addition we compare error rate and running time respectively, from the result we can indicate that our algorithm is high accurate but time is close in the condition of small sample data and in the condition of large sample data when the accurate is close but time occupy is smaller much than the other two algorithms.Based on the above study, we developed a Bayesian Networks based...
Keywords/Search Tags:Bayesian networks, parameter learning, incomplete dataset, Data Mining
PDF Full Text Request
Related items