Font Size: a A A

Genome-wide Discovery And Analysis Of Missing Pathway Genes And Miniature Inverted-repeat Transposable Elements

Posted on:2009-07-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y ChenFull Text:PDF
GTID:1100360272971453Subject:Operational Research and Cybernetics
Abstract/Summary:PDF Full Text Request
With the development of the bio-technology,large amounts of genomic data now available for the understanding of genomic architectures.More and more genomes have been sequenced and molecular biology has entered the so-called post genomic era.We can now directly interrogate global properties like base frequencies and repetitive content, obtain the distribution of any interesting genes at the genome level and understand the biological pathway by comparing multi related genomes.Many pathways have been built to study the mechanisms of molecules.However,nowdays many pathways are incompelete and sometime with many errors.To complete existed pathways in genome level is a challenging problem in pathway study.Transposable elements are important genes in genome,which can affect the genome size and gene function.Find them out can give more insight of genome evolution.In this thesis,genome-wide discovery and analysis of missing pathway genes and miniature inverted-repeat transposable elements (MITEs) are presented.The main research contents and innovation points of this thesis are as follows:·Main Research ContentsTo the missing pathway genes problem,we first introduce a powful method to find out missing genes in genome level by using operon information,similarity information and phylogenetic profile information,which highly improve the effectivity of the existed results.We further introduce an algorithm to find the motifs of genes to construct the regulon information and combine it to find missing genes.Our experiments show that we can get high effectivity and get more structure properties of pathways. To the finding problem of miniature inverted-repeat transposable elements(MITEs), we also present an algorithm to find out all possible MITEs in genome level.This method is very fast and effective,which can also give more analysis of found MITEs.We apply it to many prokaryotes and find many new functional and mapping properties of MITEs. Our studies also give more new properties of genome dynamic and gene function.Chapter 1 firstly provides a brief introduction of basic concepts of biology and graph theory and computational complexity theory,which will be used in the thesis.Chapter 2 presents a new method to identify missing genes in pathway and recruit new genes into pathway by using homology information,operon information and phylogenetic profile information for the first time.185 genomes are carefully selected based on their evolution relationship and genome size.Moreover,operons are predicted for all the selected genomes,and homologies are also calculated for any two genes(if they are homologs).Then a big graph named after "genome reference graph" is constructed, which takes the genes from 185 genomes as vertices,and there exist an edge between two vertices if and only if the corresponding two genes are in same operon or they are homologs,and the weight of edges are generated according to how the edges exist.For a specific pathway P in the target genome we assume that part of its genes(normally these genes are identified by orthologs based method) are known.So we start with the known genes,and calculate the shortest path between these genes and all other genes in the target genome.The genes which have shorter path from the known genes are predicted to have higher rank to be in the pathway P.The KEGG pathways for E.coli are used to validate our method.Our method has positive predictive value(PPV) 60%in top 10 candidates(out of 4131) when the gene number in reference pathway is equal or more than 5,and PPV can reach 90%when the genes number in pathway increases.Parameter analysis shows that our method is very robust,and some of our negative predictions are validated by the most recent release of KEGG.Further analysis shows many negative predictions are often in same other pathway,which reveals some new insights for how the pathway is defined.Chapter 3 gives a new algorithm to find motifs,which explores some new strategies. Firstly based on the concept of neighborhood set,a new probability matrix is defined, which can capture the target motifs effectively.Second,an iterative restart strategy is used,by which we can use several similar motifs' information to detect the real motif to demonstrate the effectiveness of our algorithm.We test it on several kinds of real biological sequence and compare its results with that of some other current presented algorithms.Simulation shows that the algorithm can effectively detect the subtle motifs.Chapter 4 combines the algorithm presented in chapter 3 and gene's motifs information to predict missing pathway genes.We have used the operon information to connect the genes in the same operon,but no connections between the different operons.However it is believed that the operons regulated by the same transcription factor(TF),which are named regulons in biology,are more function related.We further use the predicted motifs of each operon to find pathway missing genes.Based on the operon information we extract all promoters sets of the similarity genes(or operons),then we use the methods given in chapter3 to predict motifs of each promoters set.We define a new distance between the operons and mix distance results with primary results presented in chapter2. The experiment results show that predicted motif information is also useful and it can further improve average PPV rate.To all the pathways which genes number is more than 20,it can get average PPV rate(0.846) compared with(0.823) without this information in chapter2.Chapter 5 we first present a web-based tool(http://csbl1.bmb.uga.edu/ffzhou/MUST/) to uncover and analyze MITEs at the genome wide level.We can find all possible MITEs in a given sequence and classify them into different families due to the similarity of TIRs and IR.Furthermore,we can give automated analysis of MITEs,and output many related properties.We test this method on,4nabaena variabilis ATCC 29413 and successfully find the MITEs Nezha,which has been systematically studied.We also find there is another active MITEs family in it.Moreover,we apply our searching program to the genome of Haloquadratum walsbyi DSM 16790,which lives in extremely saltwater environments,and successfully find three possible MITEs families,Duanwu,Qixi, Chongyang.Further analysis shows that Duanwu has obvious recent transposition footprint left,and it could be an active MITEs family very recently.In each MITEs family,all the copies have conserve TIRs and DRs structures which show high similarity with each other.The conservation in different MITEs families and the high copy number suggest that there may be MITEs bursts in Haloquadratum walsbyi DSM 16790 recently. The MITE Uncovering SysTem(MUST) is fast and reliable to identify MITEs in a given genome.Its applications on the two bacterial genomes,Anabaena variabilis ATCC 29413 and Haloquadratum walsbyi DSM 16790,suggest there are many MITEs families exist in prokaryotic.Especially,the MITEs bursts phenomena found in Haloquadratum walsbyi DSM 16790 suggests that the occurrence and mobility of MITEs have very important cell function even in extremely environment living species.Chapter 6 we first identified a novel recently active MITEs,Yuanxiao,with 19-bp TIRs signals and 9-bp DRs in the four strains of Leptospira.Through the transposase encoded by ISLinl in the strain Lai ofLeptospira,Yuanxiao exerted transpositions in the common ancestor of all the four sequenced strains of Leptospira,and still retained very recent activities after the divergence of the strains Lai and Copenhageni of Leptospira.A very recent burst wave of transpositions of Yuanxiao was also observed in the four strains of Leptospira.Yuanxiao is the first recently active MITEs identified in Leptospira,and it plays a role in regulating the neighboring genes.Chapter 7 we first reported the recently active MITEs in Geobacter uraniireducens Rf4, Chunjie,and proposed that it might have been proliferated through the transposase which encoded by ISGur4 with very similar TIRs signals,since both of them were identified in Geobacter uraniireducens Rf4 and have almost identical copies with perfect DRs signals. The recent transposition of Chunjie was further confirmed by one insertion of Chunjie into an operon which was duplicated after the divergence of Geobacter uraniireducens Rf4 and its two completely sequenced close relatives,i.e.Geobacter metallireducens GS-15 and Geobacter sulfurreducens PCA.It is interesting to find that the structure of the operon does not seem to be disrupted by the insertion of Chunjie,compared with the other copy of the operon which was duplicated before the insertion.Chapter 8 concludes the whole thesis.·Innovation Points of ThesisInnovation point 1.An effective method is presented to find missing pathway genes in genome level by combining three information sources for the first time.A genome reference graph is constructed by comparing 185 genomes and a graph algorithm is used to find the relation among genes.The method is very effective and rubost.It highly improves the pathway results and gives more connections and discoverings in aimed pathway and between pathways.Innovation point 1 can be found in Chapter 2.Innovation point 2.To further improve above method of finding missing pathway genes,we continue using the regulon information.We introduce a new motif finding algorithm and use it in finding regulons in genome level.By combining the regulon information,we give more detailed analysis of the method presented in chapter 2,and further improve the results of finding missing pathway genes.Innovation point 2 can be found in Chapter 3 and Chapter 4.Innovation point 3.We present a web-based tool(MUST) to uncover and analyze MITEs(http://csbl1.bmb.uga.edu/ffzhou/MUST/) at the genome wide level for the first time.Given a genome,this tool can find all possible MITEs and give further analysis automatically.It is the first time that we observe the surprising MITEs burst phenomena in Haloquadratum walsbyi DSM 16790 which suggests that MITEs is involved in important cell function even in extremely environment living species.Innovation point 3 can be found in Chapter 5.Innovation point 4.A novel recently active MITEs,Yuanxiao is detected in the four strains of Leptospira.Yuanxiao is the first recently active MITEs identified in Leptospira, and it plays a role in regulating the neighboring genes.Innovation point 4 can be found in Chapter 6.Innovation point 5.A novel recently active MITEs,Chunjie is detected in Geobacter uraniireducens Rf4,It is interesting to find that the structure of the operon does not seem to be disrupted by the insertion of Chunjie,compared with the other copy of the operon which was duplicated before the insertion.Innovation point 5 can be found in Chapter 7.
Keywords/Search Tags:Bioinformatics, Operon, Gene Network, Network Hole, Miniature Inverted-repeat Transposble Elements (MITEs), Approximation Algorithm, Heuristic Algorithm
PDF Full Text Request
Related items