Font Size: a A A

Learning Bayesian Networks In The Presence Of Missing Values Based On Kernel Independent Component Analysis

Posted on:2017-01-11Degree:MasterType:Thesis
Country:ChinaCandidate:G H WuFull Text:PDF
GTID:2308330482979473Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The 21st century is the age of data and information. The establishment of database and information system improve our ability to analyze data and make decisions. How to find out information from large-scale data has become an important research topic as the rapid increase of quantity of the data.Bayesian networks(BN) is a research tool that using probability in uncertainty reasoning. It is a probabilistic graphical model with the combination of probability statistics and graph theory. Bayesian networks clearly express the causal relationship between various nodes, and is able to analyze the probability of uncertain events using the existing data. One of the advantages of Bayesian networks is that it can learn with prior knowledge.In biological experiments, missing data will appearance due to many kinds of reason. The mishandling of the missing data will affect the data analysis conclusion. The simplest way is to remove the samples with missing data. But this method may cause the loss of important results. The main solutions of learning Bayesian networks in the presence of missing data is to impute missing data with iterative computation, which need a long running time.This paper first introduces the background of Bayesian theory and Bayesian networks, and then analyze the basic definition and theory of Bayesian networks. This paper then propose a new Bayesian networks scoring function based on Kernel independent component analysis and Kernel independent component analysis. This scoring function can directly score the network structure under the missing data and avoid the step of imputation, thus improve the calculation efficiency and reliability of the result.This paper achieve this scoring function based on the theoretical analysis. We present a Bayesian network structure learning algorithm based on this scoring function, and provide experimental result in heterogeneous deep sequencing data. In this paper, we present a computer program that can score the Bayesian network structure according to the heterogeneous deep sequencing data which contain missing values, and this program is able to change directed acyclic graph(DAG) to partially directed acyclic graph(PDAG) according to Bayesian equivalence class and feedback phenomenon of biological network. Finally we compare our result with the standard network using Jaccard coefficient, and prove the effectiveness of our algorithm.
Keywords/Search Tags:Bayesian networks, missing data, Kernel independent component analysis, incomplete Cholesky decomposition
PDF Full Text Request
Related items