Font Size: a A A

The Distribution Feature Of The Transcription Factor Binding And Histone Modification And The Identification Of Genes With Different Expression Level In Two Cell Lines

Posted on:2017-04-29Degree:MasterType:Thesis
Country:ChinaCandidate:G G XueFull Text:PDF
GTID:2180330485961092Subject:Biophysics
Abstract/Summary:PDF Full Text Request
The binding of transcription factor (TF) and histone modification (HM) are important for the precise control of gene expression. TFs can activate or repress gene transcription by binding to specific, short sequence motifs in regulation regions such as promoter, enhancer. Furthermore, genes are often regulated by more than one TF and always associated with other factors including histone modifications. Now, chromatin immunoprecipitation (ChIP) coupled with sequencing (ChIP-Seq) technologies have been developed to identify whole-genome localization of protein-DNA binding sites. These data sets provide the raw materials to study the the regulatory functions of TFs and HMs on gene expression.The ChIP-Seq technology combined with peak-finding algorithm, the whole genome-wide location of a lot of TFs binding and HMs were provided by the ENCODE Consortium. For human GM12878 and K562 cell lines, we downloaded signal peaks data of 55 TFs binding and 11 HMs with narrowPeaks and broadPeaks formats and their RNA-Seq data. The distribution difference of 55 TFs binding and 11 HMs were analyzed in whole genome and upstream and downstream regulation region of transcription start sites(TSS). The results indicated that the distribution of TFs were different between cell lines. We defined the overlapping ratio and the average overlapping ratio to study TFs or HMs possible interaction. Based on the hypothesis that the colocalization factors play a role together, the overlapping phenomenon between different TF binding and HM were discussed for investigating their possible cooperation. We found multiple combinations of TFs binding and HMs which is consistent with gene expression level. The analysis results above reveal that there are certain correlation between the TFs binding or HMs distribution and gene expression. Based on the RNA-Seq data, we build two gene sets with high and low expression level. Applying the TF or HM association strength which integrate the signal peak intensity and the proximity to genes, genes were well classified using the SVM algorithm and the highest prediction accuracy is 93%.
Keywords/Search Tags:Transcription factor, Histone modification, Overlapping ratio, Gene expression level
PDF Full Text Request
Related items