Font Size: a A A

Gene Networks Identification Using Independence Measurement Based On The Hilbert Space

Posted on:2015-01-22Degree:MasterType:Thesis
Country:ChinaCandidate:L Y JinFull Text:PDF
GTID:2250330428463974Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Bioinformatics is a subject in regard to data processing of biology. It makes the pathologicalstudy based on accurate analysis of the data and model building, and it is reasonable to drive thedevelopment of disease prediction, individuation, systematize in the future and have a far-reachingimpact on biomedicine. Gene regulatory network is a type of biological networks and one of themain researches of Post-genome informatics and it’s a complex network which is on the basis ofbioinformatics technology in the way of data analysis, modeling and inference. Methods have beenproposed for computationally reconstructing regulatory networks, such as Boolean network, mutualinformation relevance model, differential equation and Bayesian network etc, we proposes a novelreconstruction method of gene regulatory network in this paper, that is Hilbert-SchmidtIndependence Criterion (HSIC).HSIC builds the covariance operator on reproducing kernel Hilbert space and deduce therelationship between covariance and independence, conditional independence in a mathematic wayand then to identify the structure of gene networks. It does not rely on biological prior knowledge,instead of data fitting, HSIC provides a criterion to measure the statistical dependence. Besides, it isnonparametric method, has no assumption on the data distribution. As a result of the limitation ofcalculating method, the relevance between variables is traditionally described by correlation instatistical. But the essence of gene regulatory network is the causality of the interaction betweengenes, so correlation cannot identify the structural relationship between genes indeed. Statisticalindependence is closer to the description of causality than fitness of data, correlation or simplicityof model, it defines statistics on reproducing kernel Hilbert space to expand the dimension oforiginal space statistical characterization to infinite, so that the relationship of independencebetween variables can be described more accurately.The sufficient dimension reduction mentioned in the paper is a supervised learning methodwhich is based on the conditional independence theory, and it transforms the dimensionalityreduction into an optimization problem and gets two optimization measurement method, that is thedeterminant method and the trace method. We prove that these two kinds of measurement methodshave good dimension reduction effect in actual life according to simulation experiments in thispaper, and show that the sufficient dimension reduction can be well used in the practical application.At the same time, for the sake of explaining the ability of structure identification morecomprehensive, the HSIC method is applied to three challenges of DREAM project, that isDREAM2Challenge5, DREAM4Challenge2and DREAM3Challenge4, these three challenges have different characteristics, the data of DREAM2Challenge5is steady state data, the data ofDREAM4Challenge2is time series data and the data of DREAM3Challenge4combines the steadystate data and time series data, we choose DREAM project as a research subject because DREAMcan evaluate the equality of the models established in a biological system by looking into therelationship between the results of experiments and theoretical inferences. The result proves thatHSIC has certain advantages in identifying accuracy and computational efficiency, which thus givesa more complete verification that HSIC has the great ability of reconstructing gene network.
Keywords/Search Tags:bioinformatics, gene regulatory network, model building, causality, conditionalindependence
PDF Full Text Request
Related items