Font Size: a A A

The Recognition Of Long Range Enhancer-Promoter Interactions In Human Cell Lines

Posted on:2018-05-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z X FenFull Text:PDF
GTID:1310330542480083Subject:Biology
Abstract/Summary:PDF Full Text Request
The precise spatiotemporal regulation of gene expression at specific eukaryotic cell is fundamental to various biological processes.Numerous DNA reactions are regulated by the cooperativeness of diverse cis-regulatory elements,such as transcriptional promoters,enhancers and insulators.Due to chromatin folding,three-dimensional chromatin organization can bring distal enhancers in connection with their target promoters over tens or even hundreds of kilobases away distances for gene regulation.The long-range enhancer-promoter interactions can play a critical role in the regulation of tissue-specific gene expression and may be linked to human diseases.In recent years,the development of a number of high-throughput 3C-based techniques have made it possible to further explore the spacial interactions,such as Chromosome Conformation Capture(3C),4C(circular 3C),5C(3C-carboncopy),Hi-C(3C variant)and ChIA-PET.Meanwhile,with the development of diverse high-throughput sequencing techniques,a large number of deep sequencing data profiles of genomic signatures were generated from different experimental platforms.The mass omics data have made it practicable to study the relationship between spacial interactions and diverse genomic signatures on different genomic scales.In this paper,we developed an improved computational method to identify the long range enhancer-promoter interactions in human cell lines(GM12878,H1-hESC,HeLa-S3 and K562).Then,numerous potentially influential signatures were found in the EPIs prediction,and the location,distribution and correlation properties of influential signatures were analyzed.Finally,several regression models were adopted to quantify the relationships between gene expression levels regulated by distal enhancers and diverse genomic signatures(e.g.,transcription factors,histone modifications,DNA methylation,enhancer RNA,etc.);and the different properties of diverse signatures were identified in gene expression regulated by distal enhancers.The outline of the research topics was shown as follows:1.Based on the previous 5C dataset,we extracted features of enhancer,promoter and loop regions from diverse genomic signatures,including transcription factors,histone modifications,DNA methylation,enhancer RNA,nucleosome position,chromatin state,topological associating domains,etc.With the combination of these feature parameters,we presented a feature selection algorithm of BRCFS and Random Forest classifier to predict long range enhancer-promoter interactions in four human cell lines.In comparison with the Roy et al.'s work,we obtained about 11%-16%higher AUPR values by using 10-fold cross-validation,and the AUPR accuracies of independent test were increased by 4%-8%.By analyzing the variable importance in prediction,we found the essential roles of many potential signatures,such as enhancer RNA,nucleosome position,etc.Moreover,the signatures features of the loop regions strongly facilitate the prediction.In addition,the contributions of diverse regulatory signatures to the prediction have a region-specific and cell type-specific pattern.Finally,we found that the distribution levels of top-ranked signature features were distinctly different between interacting and non-interacting sets.2.Considering that the enhancer-promoter interactions are regulated by the cooperativeness of diverse functional genomic signatures,sequence elements and DNA spatial structure;we developed a more efficient predictive model that jointly considers signature features of TFs,HMs,DNA methylation,enhancer RNA,nucleosome position,DNA structure properties,TF binding motifs,etc.By using the combined features of enhancer,promoter and loop regions from above features,we presented Random Forest and Gradient Boosting Machine method to predict enhancer-promoter interactions in human cell lines.In comparison with the previous results on the same datasets,our best accuracies by 10-fold cross-validation test were about 15%-24%higher in the same cell lines,and the accuracies by independent test were about 9%-14%higher in new cell lines.Meanwhile,we comprehensively studied the contribution properties of diverse types of influential regulatory factors,and further found the great contributions of DNA structure properties,TF binding motifs to the prediction.Following the variable importance analysis,we made network models analysis of partial correlation coefficient and found the important relationships among top-ranked functional genomic features.3.In the four human cell lines,we developed several regression methods to model the relationship between gene expression levels(regulated by distal enhancers)and diverse genomic signatures(i.e.,11 HMs,>120 TFs,Dnase I,enhancer RNAs,DNA methylation,nucleosome position).Thus,though results analysis,we found that the predicted and observed values were strongly correlated in both positive and negative sets.However,the correlation of the interacting sets were stronger than that of non-interacting sets,which indicated that enhancer-promoter interactions may cooperatively interact with a variety of genomic signatures and contribute to the gene expression.4.Though analyzing the contribution of diverse signature features to the gene expression levels,we found that the histone modifications in promoter and loop regions act important roles on gene expression.However,the transcription factors in enhancer and promoter regions act important roles on gene expression.By comparing the importance scores of different features between the positive and negative sets in the same cell line,we found that the scores of many signatures varied greatly,indicating that these signatures cooperatively interact with the distal enhancers and contribute to the gene expression.
Keywords/Search Tags:Enhancer-promoter interaction, Histone modification, Transcription factor, DNA structural properties, Transcription factor binding motifs
PDF Full Text Request
Related items