Font Size: a A A

The Mechanism Of Transcription Factors C-Myb, AtERFs Binding To Their Target Cis-elements

Posted on:2009-06-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:S YangFull Text:PDF
GTID:1100360272476545Subject:Biochemistry and Molecular Biology
Abstract/Summary:PDF Full Text Request
Accomplishment of gene expression and regulation are essentially depending on the sequence specific recognition of transcription factor to the target DNA. Efforts have been made over the last decade to seek a general readout mechanism of the information encoded in DNA. However, specific protein-DNA interaction is a redundant process and usually involved both direct readout mechanism (direct protein-DNA contacts) and indirect readout mechanism (indirect protein contacts to the sugar phosphate backbone of DNA through polar contacts that is mediated by polar molecules such as water). Both the processes will contribute to the free energy changes and induced conformational variations of protein-DNA complex formation. Consequently, this situation has caused difficulties in clarification of a consensus DNA motif to a given protein, even when their precise complex structure is revealed.Myb transcription factors belong to a proto-oncogene product super family that has been identified in all kingdoms, namely animals, plants, and fungi etc. Vertebrates express three highly related Myb proteins, A-Myb, B-Myb, and c-Myb, among which c-Myb is the best characterized during the differentiation and proliferation of hematopoietic cells. All Myb proteins possess highly related DNA-binding domains (DBD) but distinct biological functions. The DBD of c-Myb consists of three imperfect tandem repeats of 51-52 amino acids, designated R1, R2 and R3 from the N terminus, and the last two repeats, R2 and R3, are the minimal unit for specific DNA binding. R2 and R3 are closely packed into the major groove of DNA, so that the two recognition helices contact each other directly to bind in a cooperative manner the specific DNA consensus, PyAACNG (where Py represents a pyrimidine), with the conserved residues of R3 and R2 contacting PyAAC and NG half sites respectively. Myb binding site one (MBS-1) from simian virus 40 enhancer, for example, possesses a singular AACNG motif that is involved in such a sequence specific recognition mechanism by c-Myb.Chicken myeloid protein gene, mim-1, is one of the Myb target genes. It is activated by c-Myb exclusively in myelomonocytic cells and, therefore, has become an interesting model system to study how c-Myb activates a target gene in a lineage-specific manner. Analysis of the mimi-1 promoter region discovered a cluster of three PyAACNG consensuses (site A, B and C) with completely different flanking sequences. Interestingly, these three sites showed different binding affinities in vitro by the bacterially expressed v-Myb protein with site A exhibiting the strongest binding. It is noted from the appearance of these binding sites that, while the sites B and C contain only a single AACNG motif respectively, the site A (designated as Myb responsive element, MRE, in this paper ) contains a dual AACNG motif which is arranged into an imperfect palindrome sequence motif with the forward strand read 5'-TAACGGTTT-3'(designated as MRE of forward strand reading, MRE-f) and the reverse stand read 5'-AAACCGTTA-3'(designated as MRE of reverse strand reading, MRE-r). Questions have arisen as how c-Myb recognizes the dual binding site selectively and if the stronger binding affinity observed for this site is the consequence of c-Myb binding to the both strands of the site A. Although, extensive in vitro analyses were performed using c-Myb and oligonucleotides containing the core sequence from the site A of mim-1 gene, those binding experiments did not clarify the binding discrimination in a quantitative meaning.In present work, we compared the kinetic patterns of R2R3 binding to MRE as well as to MBS-1, and determined the binding specificity of R2R3 to MRE by analyzing the binding free energy changes upon single-base substitution. Although data obtained from kinetic pattern and free energy changes analyses revealed the specificity of protein-DNA interactions, it was difficult to monitor the dynamic process of the induced conformational changes in protein and DNA interaction. MD simulation provided us a convenient tool to explore the dynamic behavior of R2R3 and MRE, and the mechanism of protein-DNA interactions. The results had important implications for understanding the asymmetry mechanism of c-Myb recognition to its imperfect palindrome dual-AACNG-motif containing consensus.To define the mechanism of c-Myb binding to the dual-AACNG-motif, we carried out detailed studies on this binding using both filter binding assay and in silico analyses. The binding assay revealed that R2R3 recognized selectively to the forward strand 5'-TAACGG-3'of site A in mim-1 and, this binding obeyed the mechanism commonly reported with PyAACNG as the read out"coden", i.e a strong binding of R3 at the AAC-core and a modulate binding of R2 at the second half binding site of GG. In molecular dynamics (MD) simulation, the analyses on protein conformational variation, DNA local structure and protein-DNA contacts supported the experimental observations that the forward strand of site A in mim-1 is the recognition motif of c-Myb. These results suggested that an asymmetry recognition mechanism of c-Myb to the inherent dual-AACNG-motif may be crucial for c-Myb regulations of specific target genes.The AP2/ERF gene superfamily of transcriptional factors is one of the largest TF gene families among the plants kingdom, characterized by the presence of AP2/ERF domain. The AP2 domain was first identified in Arabidopsis as a 68 amino acid repeated motif of protein AP2, which is functionally involved in the floral development. The ERF domain was first identified as a conserved 58-59 amino acid motif in four DNA-binding proteins from tobacco and was shown to bind to a GCC box specifically. In Arabidopsis, the AP2/ERF superfamily consists of the ERF, AP2 and RAV three families and, which have defined as follows. The TFs in AP2 family contain two repeated AP2 domains, the TFs in ERF family contain a single ERF domain and, the TFs in RAV family contain both a single AP2/ERF domain and a B3 domain, which is a DBD conserved in other plant-specific TFs. The ERF family is further divided into two major subfamilies the EREBP (ERF) subfamily and the DREB subfamily. After the completion of the sequencing of the Arabidopsis genome, 145 genes were predicted to encode proteins containing the AP2/ERF domain, with 83% (121 genes) of the genes belonging to the ERF family. The solution structure of the Arabidopsis AtERF1 ERF domain (PDB ID: 1GCC) was solved by heteronuclear multidimensional NMR. The domain consists of a three-stranded anti-parallelβ-sheet and anα-helix packed approximately parallel to theβ-sheet, with the seven thoroughly conserved amino acids (Arg150, Arg152, Trp154, Glu160, Arg162, Arg170 and Trp172) in theβ-sheet contacting uniquely with the bases of the target DNA at the major groove. The phylogenetic analyses on the ERF domains of all members within the ERF family show that the residues Arg-150, Glu-160, and Trp-172 are completely conserved among the 122 proteins in the ERF family and, more than 95% of the ERF family members contain Arg152, Arg-162, Arg-170 residues. From the results of a few AtERFs studied, however, the conserved ERF domains seem not to prefer identical DNA consensus. For instance, some AtERFs have been shown to bind in vitro to the ethylene-responsive element (ERE), a GCCGCC motif named as GCC-motif, and conduct GCC-motif-mediated transcription (activation or repression) in the leaves of Arabidopsis. This ERE was first reported to be a binding site (referred as to GCC-box) of some tobacco ERF proteins [8] and later presumed to be the target site of many other ERF proteins. The ERF protein, AtEBP, was also found to protect GCC-box in a Dnase I foot-printing analysis. In difference, the dehydration responsive element (DRE), the TACCGACAT motif, in the drought-responsive gene rd29A from Arabidopsis is proven to be the recognition site of DRE binding proteins (DREBs), the transcription factors having the authentic ERF domain and involving in the induction of the rd29A expression by low-temperature stress. A similar element to DRE, the C-repeat (TGGCCGAC) was identified in the cold-inducible gene cor15a and reported to function in cold-responsive regulation through binding by another ERF protein, the CBF1. The similarity of those ERF binding elements reported and the high homology of ERF domains among the members of the whole ERF family have led to a speculation that whether the ERF domains from various subgroup within the ERF family recognize a certain binding site with a sensitive common core and the divergent short flanking bases to govern the differential recognition. We had demonstrated that various ERF domain had divergence of DNA DNA recognition mode, however, this has to date been short of other supporting evidence. Indeed, little has been known how these differences are important for the functionalities of ERFs in this transcription family, among which the majorities of other ERFs have not yet been studied.In this work, we selected four representatives from different functional subgroups (three from the EREBP subfamily and one from DREB subfamily) of the Arabidopsis ERF family, according to the functional classification of the ERF family by Nakano et al. We identified the core recognition motifs preferred by each of the four domains using random sequence selection method. Further, we characterized the binding specificities of the four domains to a DRE motif containing sequence in vitro and in vivo. The results revealed the common feature and the individual feature of various ERF domains in recognition of the same binding site and, demonstrated the importance in determination of the functional variation in the ERF TFs family. AtERF1, AtERF4, AtEBP and CBF1 are members from different phylogenetic group within the family. EMSA analyses revealed the ERF domains of these four proteins were capable of binding either GCC- or DRE-motif, the motif preference of individual ERF domain was related to the phylogenetic classification. The DNA binding motifs of four ERFs were identified and, the acquiring motifs of AtERF1 and AtERF4 were GCC-motif like, the motif of CBF1 was DRE-motif like and, all the motifs contained conserved cCG*c core. In vitro and in vivo binding assays to DRE-motif showed the four ERF domains exhibited similar binding pattern at CG core and, different bases preferences at flanking regions. It suggested that the common core CG may be the essential foundation of ERF domain binding to a certain motif, which was likely to be determined by the highly conserved residues among all ERF members and; the different preferences at flanking bases of individual ERF domain, which appears to be attributed to the subfamily- or group-specific residues, may be crucial for divergent ERF domains to discriminate its specific binding motif from various similar sequences. The results had important implications for understanding the mechanism of divergent members in the conserved ERF family discriminating various binding site specifically.
Keywords/Search Tags:transcription factor, DBD, cis-element, binding specificity, c-Myb, ERF domain
PDF Full Text Request
Related items