Font Size: a A A

Computational Analysis Of The Specificity Of DNA Recognition By AtERFs And In Silico Identification Of The Target Gene Candidates Of DREBs In Arabidopsis Genome

Posted on:2009-01-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:S C WangFull Text:PDF
GTID:1100360272476564Subject:Biochemistry and Molecular Biology
Abstract/Summary:PDF Full Text Request
Arabidopsis ethylene responsive element binding factors (AtERFs) form a transcription factor super family. While the functionality of most AtERFs are unknown, a number of AtERFs are reported to play essential role in regulation of stress-related genes, through binding to a consensus motif GCC-box at the regulatory region by their DNA binding domains, i.e. ERF domains. Phylogenetic analysis of the ERF domains led to a classification of the AtERFs super family into four predominant sub-families.In the first section of this thesis, computational analysis of the structural properties of AtERF-DNA motif complexes was performed. We selected four AtERF proteins, AtERF1, AtEBP, CBF1 and AtERF4, as representatives from each sub-family respectively and constructed four AtERF DBD-DNA complexes through homology modeling. Molecular dynamics simulations were then performed to explore the interactions between the six conserved residues and the DNA motif, GCC-box. By comparing the interactions between the six conserved resides and GCC-box among the four AtERF DBD-DNA complexes, we revealed the common properties of protein-DNA interactions among the AtERFs and the differential roles of each base of GCC-box in specific recognition by AtERFs. Our results suggested that three amino acid residues Arg29, Glu39 and Arg41 played a vital role in direct readout of DNA. The position of the consensus sequence GCCGCC has it intrinsic disparity on binding with ERF domains. The CGNC element in the GCC motif was perhaps compulsory for recognition by ERF domains. Our results provided the structural evidences for the sequence dependent recognition mechanism of AtERFs.The identification of downstream target genes of specific transcription factors (TFs) is necessary in understanding cellular responses to environmental stimuli. Most existing structures of gene regulatory network are highly complicated as it involves cooperative interactions and feedback regulations. The discovery of the direct targets of transcription factors is a fundamental step to elucidate the construction of regulatory networks. Availability of genome sequences made it possible to discover the target genes of a specific transcription factor by looking for the locations of the specific recognition motifs in genome. In practice, however, the task is still difficult due to the complication of plant genomes. During the last decade many computational methods have been developed to identify the target genes of transcription factors successfully. Among the methods, the positional weight matrix (PWM) was the technique most widely used in describing the transcription factors binding sites (TFBS) and scanning the TFBS in the genome scale. However, owing to the looseness of the TFBS's conservation, these strategies were not capable of effectively identifying TFBS in genome scale. For this reason, the approach, including the PWM and the analysis of TFBS contexts, were developed to overcome the shortage. The fundamental nature of the aforementioned approaches was in fact to develop appropriate algorithms that will describe the properties of the TFBSs and their contexts.In the second section of this thesis, we reported a novel computational strategy to determine the DREB transcription factor binding sites in Arabidopsis genome by combination of the context analysis for the TFBS and machine learning approach.Dehydration responsive element binding proteins (DREBs) are important transcription factors that induce the expression of a series of abiotic stress-related genes and impart stress endurance to plants. They belong to the ethylene responsive element binding factors (AP2-EREBPs) super family of 124 members (so-called ERF proteins), and among which 57 proteins are in the DREB subfamily. The ERF proteins share a conserved DNA binding domain (ERF domain) of 58–60 amino acids that, reportedly, binds to two typical cis-acting elements, that is, the GCC-box, and the C-repeat CRT/dehydration responsive element (DRE) motif and involves in the expression of cold and dehydration responsive genes. It is important to identify the target genes of DREBs in Arabidopsis since the DREBs play a vital role in various types of biotic and abiotic stress responses. Maruyama, et al identified the downstream genes of the DREB1A/CBF3 using two microarray systems. Fowler and Thomashow, Taji et al also reported the downstream genes of DREBs proteins. Nevertheless, the overall target genes of DREBs are yet to be discovered.The differences between the DRE frame sequences (DNA fragments of 206 bp, which were retrieved from the PPRs of MGs, contained a DRE motif (A/GCCGAC) at their center region) and non-DRE frame sequences (DNA fragments of 206 bp, which were collected randomly from the PPRs of Arabidopsis genome, with a DRE motif inserted artificially at their center region) were given focus. A machine learning approach, specifically the support vector machine (SVM) based classifier, was developed to categorize DRE-containing sequences into DFSs and nDFSs. Our results suggested that this algorithm was effective in the discovery of the DREB binding sites in the promoter region of the target genes, so as to infer the target genes of DREBs in Arabidopsis. Furthermore, we predicted 474 candidate genes as the direct targets of DREBs. With Reference to the AtGenExpress microarray data, we achieved the 268 direct targets of DREBs that was inducible by abiotic stress stimuli such as cold, salinity and drought during a 24 hours observation. The results obtained in this study provided the primary information that warranted further experimental investigation regarding the anti-stress regulatory network of DREBs in plants.
Keywords/Search Tags:transcription factors, DNA binding, molecular dynamics, target gene, machine learning, Arabidopsis
PDF Full Text Request
Related items