Font Size: a A A

Study On Applications Of Complex Network And Cellular Automata In Bioinformatics

Posted on:2008-11-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y B DiaoFull Text:PDF
GTID:1100360242464069Subject:Analytical Chemistry
Abstract/Summary:PDF Full Text Request
Along with the rapid development of modern biological techniques, bioinformatic data and resources increase at explosion speed. Meanwhile, the improvement of computation capability and development of World-wide-web make it possible for the preservation, disposal and transmission of mass data. For the rational organization, effective management and further utilization of known biological informations, a discipline intercrossed by life science and information science comes into being and greatly promotes relevant research from molecular biology and computer-based information manage technique.From its born, bioinformatics experiences three eras named before-genomic era, genomic era and post-genomic era, respectively. The representative work of before-genomic era comprises the foundation of biological database, the development of indexing tools and DNA or proteins sequence analysis; the symbol of genomic era includes the discovery and identification of genes, the foundation of web database system and the development of interface tools; the sign of post-genomic era is mass scale analysis of genomics, proteinomics and comparison or integration of bioinformatics data. This presentation attempts to introduce complex theory into bioinformatics domain and gets significant result on several important questions of post-genomics era.The characteristic of complex system is nonlinearity, which also means"the whole is not equal to the sum of its parts". The molecular foundation of this nonlinearity is the generic and intricate interactions among all sorts of bio-macromolecules, genes and proteins. In most cases, these bio-macromolecules never behave or perform their biological functions alone, but have many direct or indirect relations between each other, which could be in physical or chemical manner. It is these relations that bring about various biological networks, such as metabolic network, gene regulation network, and signal transduction network, etc. Ultimately, all activities of life fall back on these networks in their structure and function.As is known, the construct of biological network is the groundwork of network static geometric analysis, dynamical analysis, important vertex analysis, network regulation strategy discovery, digit experiment and simulation, etc. Through simple object access protocol (SOAP), we employ the web service provided by KEGG to extract the signal transduction data of Homo sapiens. By transforming them into neighbor matrixes and then combining these neighbor matrixes through matrix operation, we construct an undirected graph of the cellular signaling network of Homo sapiens, which contains 931 nodes and 6798 links altogether. Computing the degree distribution, we find it is not a random network, but a scale-free network following a power-law of P(K)~ K–r, withγapproximately equal to 2.2.Since this network is proven not to be a random network, it is of significance to investigate the possible existent community structure. Among three graph partition algorithms, the Guimera's simulated annealing method is chosen to study the details of topology structure and other properties of this cellular signaling network, as it shows the best performance. To reveal the underlying biological implications, further investigation is conducted on ad hoc community. Finally, the potential impact on the basic research and drag discovery is discussed.In the Part Two of this dissertation, we use cellular automata (CA) to construct a discrete model of biological sequence and predict the topology structure of transmembrae proteis, which play critical roles in the cellular signaling networks. CA is a dynamical system discrete in both time and spatial. Spread in regular lattice, each cell adopted finite discrete state and updated synchronously according to explicit local rule. The evolution of entire dynamical system was implemented through simple and exact interactions between those cells, the characteristic of which was discrete in time, spatial and state, every variable only adopted finite state, and the state transforming rule was local both in time and spatial.The present study is to develop an integrative method for predicting the topology of transmembrane proteins on the base of CA. First, scanning the requested protein sequence with a fixed-size window of 20 amino acids residues; then, the segments thus obtained are transformed into binary sequences by an encoding procedure, upon which the cellular automata are applied to derive pseudo amino acid components; finally, the augmented covariant-discriminant algorithm is used to predict the topology of requested protein. The result suggests this method is an effective tool for the prediction of bothα-helical andβ-barrel proteins with high accuracy, validated by jackknife cross-validation test. Moreover, based solely on the amino acid sequence, this method does not require any other annotations or sequence alignment information, which indicates that the current approach might be a quite potential high throughput tool in dealing with problems such alike in the post-genomic era. Meanwhile, it does not escape our attention that the possible usage of this method on improving the prediction quality for a series of other protein attributes, such as subcellular localization, enzyme family classes, G protein coupled receptor classification, and protein quaternary structure types, among many others.
Keywords/Search Tags:Complex system, Scale-free network, Information theory, Cellular automata, Cellular signaling network
PDF Full Text Request
Related items