Font Size: a A A

Identifying Cis Regulatory Motifs In Transcription Of Prokaryotic Genomes

Posted on:2011-10-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:B Q LiuFull Text:PDF
GTID:1100360305450183Subject:Operational Research and Cybernetics
Abstract/Summary:PDF Full Text Request
Bioinformatics is a new joint field, which applies computer science, mathematics and information theory to the research of molecular biology. In the last decade, the rapid development of Bioinformatics dramatically improved the research in biology, and also provided a lot of challenge problems for other fields. The main topic we concerned in this thesis is prediction of cis regulatory motifs in Prokaryotes through combinatorial techniques.Transcription initiation is regulated through interactions between the trans-acting elements, referred to as transcription factors, and the cis-regulatory elements, called DNA binding sites (or motifs when referring to the sequence patterns of the binding sites). Accurate identification of the cis-regulatory elements encoded in a genome can provide useful information about transcriptionally co-regulated genes, a key piece of information for elucidation of transcription regulation networks.We firstly present a computational method for solving motif finding problem through designing a motif model for motif presentation and evaluation. Instead of scoring candidate motifs individually like in all the existing motif-finding programs, our method scores groups of candidate motifs with similar sequences, called motif closures, using a p-value, which has substantially improved the prediction reliability over the existing methods. Our new p-value scoring scheme is sequence-length independent, hence allowing direct comparisons among predicted motifs with different lengths on the same footing. We have implemented this method as a computer program MREC, and have extensively tested MREC on both simulated and biological data from prokaryotic genomes. Our test results indicate that MREC can accurately pick out the actual motif with the correct length as the best scoring candidate for the vast majority of the cases in our test set. We compared our prediction results with two motif-finding programs Cosmo and MEME, and found that MREC outperforms both programs across all the test cases by a large margin. The MREC program, coded in C language on linux system, is available at http://csbl.bmb.uga.edu/-bingqiang/MREC1/.After MREC, we present new software, BOBRO, for prediction of cis regulatory motifs in a given set of promoter sequences. The algorithm substantially improves the prediction accuracy and extends the scope of applicability of the existing programs based on two key ideas:(a) we developed a highly effective method for reliably assessing the possibility for each position in each given promoter being the (approximate) start of a conserved sequence motif among the given promoters, facilitating accurate identification of each conserved motif by finding maximal cliques in a graph defined over sequence positions with high possibilities being the starts of conserved motifs; and (b) we developed a highly reliable way for recognition of actual motifs from the identified cliques based on the concept of motif closure introduced in MREC. We have compared the prediction performance of BOBRO with those by five popular prediction programs on large-scale data sets in a systematic manner, and found that BOBRO is at least 42% more accurate than the best performing program of the five across all the test datasets. Our genome-scale application of BOBRO, in conjunction with phylogenetic footprinting information, on E. coli K12 identified 1,472 experimentally confirmed cis regulatory motifs. Additional attractive properties of BOBRO include that it does not require the length information of the to-be identified motifs, and can find multiple conserved motifs simultaneously. The executable code of BOBRO, written in ANSI C on Linux, is available at http://csbl.bmb.uga.edu/-maqin/motif finding/, and a server version of the program is also available upon request.
Keywords/Search Tags:Bioinformatics, Transcription factor binding sites, Motif finding, Prokaryote, Combinatorial algorithm
PDF Full Text Request
Related items