Font Size: a A A

Research Of Application Foundation On Bayesian Networks

Posted on:2008-09-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:L Y DongFull Text:PDF
GTID:1118360212497911Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Bayesian networks is a kind of probabilistic graphical model to represent the relationships between variables, it provides an effective and natural way to represent casual relationships. It is one of most effective theory models in expression of uncertainty knowledge because it has a strong ability for probabilistic reasoning and the characteristic of easy understanding to humans. So it shows a special advantage in data mining. For these advantages in dealing uncertainty knowledge, it becomes one of the fields in data mining which studied most. In this paper, we first introduce the history of data mining, the kinds and ways of data mining, the developing process and current research situation of Bayesian networks, Bayesian classifier and the applications of Bayesian network. Then we provide a tutorial on learning Bayesian network structure from continual variables, research on Bayesian classifier and its applications and visualization methods. The detailed information is described below:1. Learning Bayesian networks includes learning the structure and learning the parameters. This paper provides a continual Bayesian network structure learning method based on predict ability. Through the discussion about predict ability between continual variables and computing ways, we propose a continual Bayesian networks structure learning method based on predict ability. It has two steps, each of which includes cycle checks. First, we create an initial Bayesian networks structure, then we regulate the initial structure, including add the missing arcs, remove redundant aces and correct the wrong direction of aces. Meanwhile, we did some contrast experiments on simulation data. This method adopts center adjacent coverage technique which can avoid scattering continual variables, reorder variables and assumption of normal distributing or mixed normal distributing. It is a good way for learning continual Bayesian network structure and it can build the right Bayesian network structure showing the dependency relationships of variables among data.2. By introducing genetic algorithm to construction of Bayesian classifier, we propose a constrained Bayesian network classifier construct algorithm based on genetic algorithm–GBAN algorithm. This algorithm adopts genetic algorithm to learn structure and reduces the complexity of learning the structure of Bayesian network. Meanwhile, we expand the structure of tree-augmented na?ve Bayesian network and then get a constrained Bayesian network classifier. As far as this classifier'structure learning, the fitness function based on logarithm likelihood is designed. The code scheme of network structure, and the corresponding genetic operators are designed either. As a result, the algorithm converges on the overall optimal structure. Experimental results show that GBAN algorithm performs well when the relationship between attributes of a data set is relatively complicated.3. Bayesian networks classification is an important way to solve the problem of classification. But for its high complexity of learning unlimited Bayesian network classifier, we want to find a constrained structure of Bayesian network classification. Na?ve Bayesian network classifier is a simple and effective Bayesian classifier, but its performances is sometimes not very good for its assumption of conditional independence between variables. To solve this problem, after analyzing the other forms of Bayes'theorem and applying divide and conquer method, a multi-module integration Bayesian network classifier (MSIB) is proposed. The construction of MSIB is based on our MSIB algorithm. MSIB is integrated by some mixed na?ve Bayesian classifiers, each of which is constructed based on attribute sets division algorithm based information entropy (FDBE). FDBE divide attribute set into some independent attribute sub-sets and for each attribute sub-set, we create a sub-classifier i.e. MNB and its corresponding conditional probability table, then we get MSIB by integrate each sub-classifiers and their conditional probability tables through Bayes'theorem. MNB classifier is a betterment of TAN, which improve the selection of leaves of TAN classifier and get the relation of sub nodes under a condition of a class is known. MSIB classifier is classifier integrated by many MNB classifier. MSIB classifier divide a big classifier into some independent MNB classifier. By combining these MNB classifier, we finally get MSIB classifier. Through theory analysis and experiments, we find MSIB classifier has a better classification result.4.Bayesian networks classifier is used to solve practical problems. We use our way of classification into the medical image analyse system which deal with the identification of particulate in the urine sediment image. For these factors such as there are lots of noise and the asymmetry of gray, we need to pre-process the image. Pre-processing adopts the mathematical morphology methods, and carries out edge pick-up, gradient graph double value, corrosion, expansion and divide into a single cell images, then we extract the features such as shape, gray and texture, finally we get our classifier which can be used to classify the particulate of urine sediment image. Experiments show this way of image classification is effective.5. Visualization is to show nonrepresentational information in a simple way in data mining techniques. The purpose of this is to make use of the ability of get information from visual model and structures of human, so that we can instruct the process of data mining and help understand the mining result. Visualization includes data visualization, data mining process visualization and data mining result visualization. Data visualization is conducted before data mining but not in the process of mining. Data mining process visualization is conducted in the process of data mining which helps people to mining data more efficient. People use interactive ways to get the analysis information of data they are interested in, not only query the data. Data mining result visualization is for people to better understand the mining result by showing information in an understandable way. Through research on visualization, we integrate visualization techniques into our data mining system-DBIN Miner and provide a module in the system. This module has a good expansibility. By expand the drawing of basic graph, this module can visualize data of scatter point graph, statistic graph used to observing data, and can achieve process visualization used to control mining process and can perform result visualization used to show result of clustering, regression and decision tree and Bayesian ways.
Keywords/Search Tags:Data mining, Bayesian network, Bayesian network classifier, continuous Bayesian network, na?ve Bayesian, classifier, genetic algorithm, Visual Data Mining
PDF Full Text Request
Related items