| Gene expression patterns changed in different tissues and developmental stages,leading to cell differentiation,ontogenesis,and different structures and functions of organs.Transcription factors(TFs)are key regulators in gene expression,they play crucial roles in gene expression both spatially and temporally.TFs are able to recognize and bind to special DNA sequence,to promote or block the specific gene expression,therefore regulate the expression of specific target genes.Identifying the targets of a TF could lay foundation for the study of gene regulation.In this study,we collected human TF ChIP-Seq data from a variety of public resources,and customized an analysis workflow to detect reliable TF targets with taking epigenomic states into account.The predictions are available in an open access database.Meanwhile,we also analyzed the tissue-specific transcriptional regulation,TF co-association,and the TFs that regulate long non-coding RNAs(lncRNA)and got the following results.Firstly,we obtained a large-scale of human TF targets prediction results,and built an open-source database hTFtarget.Two different strategies were applied to predict TF targets,including ChIP-Seq analysis and transcription factor binding sites(TFBS)scanning.The first approach is based on ChIP-Seq high throughput sequencing data.Firs of all,we collected 3,231 datasets of 488 human TFs from public database such as ENCODE and NCBI,and detected their binding peaks on human genome.Next,we extracted less than 5 motifs from the top 1000 peak sequences,and scanning TFBS with these motifs among all the peaks.Then we quantized the regulatory ability of TF to target gene with the distance from transcription start site(TSS)using BETA,a scoring model with exponentially decaying.Finally,to provide an integrated targets prediction,targets prediction result of individual dataset is collected and filtered according to the epigenomic states of histone modification.As a result,the median targets of each TF is 342.The second approach is predicting TF targets with known motifs.We collected 2,737 position weight matrix(PWM)of 699 TFs from TRANSFAC,JASPAR and HOCOMOCO,and scanning these motifs in conserved regions of human,rat and mouse genome to identify potential targets of these TFs.Based on these results we achieved,we built a comprehensive human transcription factor target database named hTFtarget(http://bioinfo.life.hust.edu.cn/hTFtarget),to provide searching and querying service.Users could search results with diverse methods,including query targets of given TFs or vice versa.Moreover,users could practice conjunctive queries,batch queries,and filtering query results.Secondly,we tried to explore the regulatory mechanism of TFs with these prediction results.We evaluated the association between transcription factor binding and epigenomic states,and found that 1)in many cases,the top ranked peaks of TFs are tend to bind to genome regions with Active TSS and Flanking Active TSS states,2)the peaks from multiple datasets of a same TF or members of the same TF family are tend to enriched in similar epigenomic states.Next,we analyzed the tissue-specific targets of 14 TFs among 10 different cell lines,and discussed the function of TFs according to functional enrichment results of their targets.We also applied a machine learning algorithm to impute TF co-association among different cell lines,and obtained relative importance(RI)profile of partner factors to focus-factor,which provide a quantitative perspective to estimate TF co-association.Lastly,we analyzed the role of TFs in IncRNA regulation,and the impacts of SNP in transcription factor binding.We analyzed the regulatory effect of TFs to different types of genes,and found TFs are tend to regulate protein-coding genes and IncRNA genes.We detected 9,815,083 SNPs locate on transcription factor binding sites,and 231,558 of them are located on promoter region of IncRNA genes,and these SNPs may result in the loss of function on individual transcription factor binding sites.By comparing these SNPs,which could cause TFBS losing,with IncRNA-SNPs collected in IncRNASNP database,a database curated IncRNA and SNP interaction,we found 68,597 SNPs located in lncRNASNP.As a summary,we proposed systematically predictions of human TF targets,and organized the results into hTFtarget database.Meanwhile we studied the tissue-specific regulation,co-association and IncRNA regulation of TFs,these data and results provided solid resources for gene expression studies and important clues for understanding the regulatory mechanism of TFs. |