Font Size: a A A

Identification Of Mouse BHLH Transcription Factor Family And Construction Of Its Regulatory Network In Brain

Posted on:2008-01-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:J LiFull Text:PDF
GTID:1100360215976834Subject:Biochemistry and Molecular Biology
Abstract/Summary:PDF Full Text Request
The regulation of gene expression is a core component of the research of functional genome. It is well known that recognition of binding sites (BS) by transcription factor (TF) and the initiation of downstream gene is a key step of gene expression。Many diseases were found to be correlative with the mutation of transcription factors. Up to date, The TFs in biomodel organisms, like human and mouse, about 50% are still unknown. As a part of transcription regulatory research, identifying the extent of TFs or transcription factor families is a prerequisite for constructing a regulatory network. From yeast to human, basic/Helix-Loop-Helix (bHLH)transcription factor family play a central role in cell proliferation, determination, and differentiation. The bHLH proteins usually have two functionally distinct domains: the DNA-binding basic domain and the C-terminal HLH domain. The DNA-binding basic domain(~15 amino acids) has high number of basic residues; the C-terminal HLH domain(~40 amino acids) is formed by two amphipathicα-helices connected by a loop of variable length. Crystal structural studies have shown that the bHLH proteins dimerize via HLH domains, adopt a scissors shape, and bind DNA via the basic domains. Up to now, the genome-wide prediction and evolutionary analysis of bHLH transcription factor families have been performed in C. elegans, Drosophila, Yeast, human, Arobidopsis and rice. A consensus predictive motif was established by the position association (pa) statistics from 242 bHLH domain sequences and included 19 elements. This motif has been proved to identify bHLH domain-containing proteins accurately. The research contents are as following: Based on this consensus motif, optimized query sequence set and Dynamic programming (DP), we identified the complete set of bHLH protein family from the mouse proteome databases and carried out a series of bioinformatics analysis. Here DP algorithm was employed to search the highest score of match of a protein for the predictive motif. As results, 124 mouse bHLH proteins are identified with our BLAST-DP method in this study, and 28 of them were additional bHLH proteins beyond the previous report. Moreover, 10 of additional members are hypothetical proteins are potential novel bHLH TFs. Comparative analysis show that the the conservation of major sites of bHLH proteins in the predictive motif is higher in mouse than in Arabidopsis. These 124 mouse bHLH proteins are classified into groups from A to F by the nomenclature and phylogenetic analysis. Statistic analysis of the Gene Ontology annotation of these proteins shows that the bHLH proteins tend to perform functions related to cell differentiation and development. Gene function enrichment analysis among six groups illuminates that the proteins in certain group tend to have special biology functions. The results show that group A plays roles in cell fate commitment and nervous system development. Group B proteins regulate the progression through cell cycle and have DNA binding activity. GO annotations of signal transduction, transferase activity, transcription coactivator activity and response to stimulus are enriched in group C significantly. Only heart development enriched in group D proteins significantly. Group E proteins play roles in vasculogenesis and negative regulation of development. bHLH proteins of group F have cation binding and zinc ion binding activity especially. Obviously, different groups tend to have certain functions. We suggest similar sequences of the bHLH proteins result in their similar gene functions. Therefore, the molecular function of the uncharacterized proteins in groups could be inferred.Although our BLAST-DP method is optimized and improved for predicting mouse bHLH protein based on BLAST method, the extent of organisms in which the method are applied are too limited. The BLAST-DP predictive flow for mouse proteome or genome does not fit other organism probably. Otherwise, the predicted results are disturbed by the experience of researcher easily because of the choice of query sequence set and cut-off score. In order to overcome the shortcomings above, a predictive model is built with Profile Hidden Markov Models (PHMMs). Using eukaryotic experiment-confirmed bHLH TFs as training data set, the prediction Hidden Markov Model for bHLH proteins (bHLH-HMM) is built by two steps (hmmbuilt and hmmcalibrate). The evaluations of models or methods show that HMM is always better than BLAST-based methods obviously on sensitivity and accuracy. For obtaining more novel mouse bHLH proteins, we search mouse proteome again with the bHLH-HMM. As results, 113 mouse proteins are defined as bHLH protein, and one of them is a new one in comparison with BLAST-DP search. 95 percent of the search results of these two methods are identical. In order to obtain ortholog information, we searched the proteomes of human and rat as well, and identified the ortholog relationship with mouse bHLH TFs using best-best method. Especially, the rat bHLH TF family is reported firstly in this thesis. Totally, 125 bHLH TFs are identified from mouse proteome.Mouse, rat and human are the representatives of mammal. We firstly classified the mammal bHLH TFs into 30 families (bootstrap>35%) through phylogenetic analysis of all bHLH proteins of mouse, rat and human. Each family is assigned a name according to the name of subfamily or best known member of the family. The sized of these families are different, and each family has 11 members averagely. An unknown family with high support is found in the evolutionary tree. 30 families are conserved well in three organisms and there is not any family present in only one organism.bHLH TFs have been demonstrated to play crucial roles in the development of central nervous system (CNS). Up to now, the report for regulatory network of bHLH Transcription factor (TF) family is rare. In order to understand the regulatory mechanisms of bHLH TFs in mouse brain, we inferred the regulatory network of bHLH factors from the genome-wide gene expression profiles with the Module Networks method.A regulatory network comprising great important 15 bHLH TFs and 153 target genes is constructed, which is divided into 28 modules according to their expression profiles. Regulatory- motif search shows the complexity and diversity of the network. Each module is named by its most significant enriched molecular function. On the side, 26 cooperative bHLH TF pairs are also detected in our regulatory network. This cooperation probably suggest the protein-interaction or regulation between TFs. Interestingly, active TFs in the network like Neurod6 and Hey2 regulate more than one module. The cross-repression between these two TFs in different tissues or brain regions is observed in our results. We investigated transcription factor binding sites (TFBS) in the promoters of their target genes for evidence; more than seventy percent of TF-target gene pairs of the network are validated. Moreover, Literature mining provided additional support for five modules. Experimentally, in the largest module the regulatory relationship among key components was validated in mutant animals. Our network is reliable and will be helpful to understand the regulatory mechanisms of bHLH TFs in mouse brain and useful for further analyses in the experimental society.Summarily, aim to mouse bHLH transcription factor family, using mouse genome, proteome, public chip data and kinds of bioinformatics approach, the studies in this thesis proposed a more reliable and convenient method BLAST-DP for predicting mouse bHLH protein family, built a HMM of bHLH TFs which is aviable for all eukaryotic. In the results, we obtained 125 mouse bHLH TFs using two predictive methods described above. The classification, evolutionary analysis, function enrichment and ortholog-identification were performed as well. At last, we constructed a first transcription regulatory network of bHLH TFs in mouse brain using all mouse inferred bHLH TFs and chip data. Furthermore, the analysis of modules, network properties, evaluation and experiment-confirmation were done well. Our research results must not only push the study for the transcription regulatory mechanism of mouse but also be useful for the study for the related human diseases. Moreover, the methods or analysis flow which has been built in our thesis can be extended and applied to the other transcription factor families, even to the prediction of whole transcription regulatory network.
Keywords/Search Tags:Transcription factor, Regulatory network, basic/helix-loop-helix, mouse
PDF Full Text Request
Related items