Font Size: a A A

Research On A Structural Learning For Genetic Regulatory Network Based On Kernel Conditional Independence Cluster Permutation Test Algorithm

Posted on:2018-03-20Degree:MasterType:Thesis
Country:ChinaCandidate:X GuoFull Text:PDF
GTID:2310330518489429Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the arrival of the intelligent era, a variety of large data information systems are established, which need humans to make data analysis and decisions on the basis of large data. Due to the expansion of data volume, how to dig out effective information from large-scale data has become an important research topic. Similarly, how to solve the problem of the algorithm caused by dimension explosion is also an important issue.Bayesian Networks (BN), as a probabilistic graphical model that combines probability statistics with graph theory, is one of the theoretical models of uncertainty in knowledge representation and reasoning, which has become a hot topic in recent years since 1988.The Bayesian Networks clearly expresses the causal relationship between the nodes, and it can use the existing data to analyze the probability of occurrence of uncertain events.In terms of conditional independence test, Gary bases on the idea of Kernel-based two-sample test, uses the MMD (maximum mean discrepancy) algorithm to measure the variance of the distribution, and then uses the P-value statistical hypothesis to get the KCIPT algorithm. The KCIPT algorithm is proposed to test the independence of the condition. Based on Gary's KCIPT algorithm, this paper uses Bayesian Networks theory to combine kernel method with statistical hypothesis test. Through a large number of experiments and model improvements, a new algorithm KCICPT is established based on KCIPT algorithm. The algorithm solves the problem that the KCIPT algorithm is not suitable for big-sample data. The main work of this paper is as follows:(i) By using K-means clustering, the time complexity of KCIPT algorithm for permutation optimization on big-sample data is reduced. We will replace the clustered results according to the results of each classification, and then combined the results into the overall replacement results, which reduces the displacement optimization time as a whole. The time complexity of permutation optimization is reduced from O (N3) to O(nm3),where n and m are much smaller than N. The size of N is the number of samples of the data. The m is the upper limit of samples in the cluster.(ii) The Gram matrix is processed by matrix decomposition, thus we obtain the expression of low time complexity of MMD, which reduces the time complexity of calculating the MMD of KCIPT. The time complexity is reduced from O(N2) to O(m2N), where m is the number of Cholesky matrices, it is much smaller than N.(iii) In order to reduce the computational time of P-value, we solve the time complexity of MMD value and use Gaussian sampling to obtain Gaussian distribution. We get the MMD value directly by its mean, which greatly reduces the complexity of time. The time complexity is reduced from O(MV) to O(m),where m is the number of shuffling number, it is much smaller than N. The M is the Monte carlo iteration times.In summary, the innovation of this paper is mainly embodied in three aspects. The first aspect is to reduce the time complexity of permutation optimization by means of clustering method. The second aspect is to reduce the time complexity of the calculation of the MMD value by means of matrix decomposition. The third aspect is to reduce the time complexity required for calculating P-value by training data sampling.Experiments show that the network data of continuous type can be deduced on the data set of Sachs (2005). The algorithm can effectively deduce the network structure.The reproducing effect of the directed regulation network deduced by KCICPT is better than that deduced by the traditional non-kernel method.
Keywords/Search Tags:PC algorithm, Bayesian Networks, Causal Inference, Kernels Methods, Hypothesis Testing, Clustering, Matrix Decomposition
PDF Full Text Request
Related items