Font Size: a A A

Research On Computational Intelligence Method Of Gene Regulatory Network Modeling

Posted on:2015-01-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:B YangFull Text:PDF
GTID:1268330431455304Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the completion of genome sequencing, research on the function of single gene or protein, could not reveal fundamentally the occurrence and development law of biological phenomena, so in the past decade, systems biology has been becoming the center of concerns among numerous biology branches. Systems biology is a new, rapidly developing interdisciplinary, which combines the knowledge and skills in many disciplines, such as biology, chemistry, physics, mathematics and computer science. The purpose is to study physiological mechanisms of biological systems in the system and global perspectives. It is one of the effective means of systems biology research in recent years that the modeling of gene regulatory networks utilizes gene expression data. To construct accurately gene regulatory networks (GRN) will greatly affect the accuracy of disease treatment, deepen understanding the cellular activities and the function mechanisms of causative gene, and have a profound impact on prevention, diagnosis and treatment of complex diseases. Although some achievements have been made, gene regulatory network with some characteristics, such as strong coupling, random, time-varying, strongly nonlinear, etc., is a complex and huge system. The existing methods which are very simple, could not accurately identify transcriptional regulatory relationships among genes, and create too many false positive relationships. How to effectively establish the precise model of gene regulation is a hot research currently.Computational intelligent methods were used in this dissertation to achieve the mining of gene expression data, gene regulatory network reconstruction and modeling of biochemical reactions in the process of gene regulation. The methods were applied to the microarray gene expression spectrum resulting from the coronary atherosclerotic plaque. Specifically, the main contributions and innovations of the thesis were described as follows.1. Modeling in the macro perspective.Based on the notorious performance of existing models, flexible neural tree model (FNT) was proposed to construct gene regulatory networks and forecast time series from gene expression profiling. Genetic programming like tree structure-based evolutionary algorithm was used to optimize the hierarchical structure of the FNT model, and simulated annealing algorithm was proposed to evolve the parameters encoded in the structure. Both optimization algorithms were used interchangeably. This loop continued, until a satisfactory solution was found, or the iteration limit was reached. In order to improve the accuracy of gene regulatory networks, akaike information criterion (AIC) and majority voting method were used to identify minimal regulatory elements of a target gene. Experimental results showed that, compared to the Elman neural network, fuzzy neural network, RBF neural networks, recurrent neural networks, fuzzy recurrent neural networks, and these models ensemble, the FNT model could improve the forecasting accuracy of gene expression profiles and reconstruct networks more accurately.All existing methods of inferring gene regulatory networks have their strengths and weaknesses. Compared with the single model, combining multiple models is more accurate and stable for constructing gene regulatory networks, and also the research trend. This paper first presented a novel method which combined multiple models, namely RMIHM (Gene Regulatory Network Reconstruction Based on Mutual Information and Hybrid Models). In the method, the linear/nonlinear models were used to construct gene regulatory networks respectively, and the overall network integrated network topologies from linear and nonlinear models. The additive tree models were proposed to encode the linear/nonlinear model, genetic programming and particle swarm optimization were used to evolve and evaluate each additive tree model respectively. Fitness function contained sparse and correlation coefficients. Sparse coefficient satisfied the condition that each target gene had a tiny fraction of the candidate regulators as true regulators, and the correlation coefficient utilized mutate information from information theory to evaluate the correlation between gene pairs in order to select maximum relevance regulatory factors of each target gene. Experimental results showed that the method was more accurate than classical single method. Not only was the true positive rate higher, but also false positive rate was lower.2. Microarray data processing, regulatory pathway construction and features of inter-chromosomal distribution of disease-related genes in human genome.In the paper, all atherosclerotic plaques in coronary artery and normal coronary artery tissue samples were provided by the tissue bank in Qilu Hospital and Liaocheng People’s Hospital. Human Genome U133Plus2.0Array (Affymetrix) was used to build gene expression profiles of atherosclerotic plaques and normal tissue samples. By comparing two kinds of expression profiles,1104differentially expressed genes were screened. These genes were analyzed using GO functional classification and pathway analysis, in order to understand the biological functions and pathways. GO analysis found that coronary atherosclerosis differentially expressed genes involved the multiple biological functions, such as cell adhesion, biological adhesive and so on. Pathway analysis revealed that genes significantly enriched in focal adhesion pathway. Gene regulatory network reconstruction based on mutual information and hybrid models, which was introduced in the fourth chapter, was proposed to predict the regulation relationships among differentially expressed genes in the focal adhesion pathway. We correctly predicted the Rho kinase regulatory mechanisms, which demonstrated the effectiveness of the approach.The paper collected the genomic data of model animals including human, mouse, zebrafish, fruit fly and C. elegans, disease-related protein-coding genes of14diseases and related data, and leukemia-associated mutations. By analyzing the spatial inter-chromosomal distribution of genes, we found that inter-chromosomal distribution of genes displayed a heterogeneous pattern. Disease-associated protein-coding genes had a similar inter-chromosomal distribution pattern, and involved in certain biological processes tended to be enriched in one or a few chromosomes. Human chromosome19had the highest or second highest frequency of harboring disease-associated protein-coding genes; and this might be related to the fact that this chromosome harbored more genes involved in transcriptional regulation. These findings could be useful in improving the efficiency of disease-associated gene screening studies, such as GWAS, Genome-wide Linkage analysis and whole-genome sequencing, by targeting specific chromosomes.3. Modeling in the microscopic and stochastic perspectives.Gene regulation involves a large number of biochemical reactions. Discreteness and stochasticity may play important roles, particularly in the system where low number of molecular species or slow interactions between them. This paper presented a new modeling approach for the automated design of stochastic and delayed stochastic biochemical reactions. Additive reaction model was proposed to encode the chemical reaction, first integrating stochastic, discrete and delayed modeling into a computational framework. Genetic algorithm and particle swarm optimization algorithm were used as nested hybrid evolutionary strategy to identify the structure and parameters of model. Experimental results showed that additive reaction model and nested hybrid evolutionary strategy could accurately identify the stochastic and delayed stochastic biochemical reaction models.
Keywords/Search Tags:Data mining, Gene regulatory networks, Intelligent computing, DNA microarray, Biochemical reactions
PDF Full Text Request
Related items