Font Size: a A A

Study On Transcriptional Pause Based On GRO-seq Data

Posted on:2022-04-04Degree:MasterType:Thesis
Country:ChinaCandidate:Z Q ChenFull Text:PDF
GTID:2480306725492504Subject:Biochemistry and Molecular Biology
Abstract/Summary:PDF Full Text Request
Transcriptional pause is an important rate-limiting step in the process of gene transcription,which often occurs 25-50 nt after the start of transcription elongation.Studies have pointed out that cis-acting elements and the trans-acting factors are the two main factors that determine transcriptional pause.The regulation of pause and pause release plays an important role in gene expression during cell differentiation and development.Abnormal release of pause is related to cancer.At present,the specific mechanisms and regulatory signals of transcriptional pause are not yet fully understood,and the rules of the occurrence of pause on a genome-wide scale and the relationship with gene expression profiles are not very clear.This thesis uses GRO-seq transcriptome data to analyze the distribution of transcriptional pause genes in a variety of cell lines genome-wide,and establish a computational prediction method for pause genes.It provides a basis and method for in-depth study of the role of transcriptional pause in the regulation of gene expression in various biological processes.This thesis has completed the following research contents:First,by integrating existing software,methods,and scripts,a transcriptional pause gene identification analysis pipeline based on GRO-seq data,gro PIA(Identification and analysis of transcriptional pause genes based on GRO-seq data),was built.This pipeline is suitable for the LINUX system,including modules such as data preprocessing,identification of transcriptional pause genes and downstream gene enrichment analysis.The effectiveness of this pipeline has been verified in breast cancer cell lines and hela cell line.Using gro PIA and GRO-seq data,we conducted in-depth mining of transcriptional pause genes in six breast cancer cell lines and human mammary epithelial cell line MCF10 A,and identified pause preference genes.It was found that more genes paused in breast cancer cell lines.On average,33.7% of protein-coding genes in cancer cells were paused,and 12.8%of protein-coding genes in MCF10 A were paused.Through GO analysis,it is found that most of the pause genes are involved in the biological processes of ribonucleoprotein complex biogenesis,ribosomal biogenesis,r RNA metabolic process and RNA splicing.By analyzing the relationship between the degree of pause and expression,it is found that the two are negatively correlated.In order to obtain the sequence feature of pause genes,this paper also uses the MEME method.Three motifs enriched in the transcriptional pause gene sequences are extracted.They are the binding sites of KLF5,NRF1,and ELK4 transcription factors.The limitation of the gro PIA in actual use is that the current GRO-seq data source is limited.Considering that genome data and Ch IP-seq data sources are more abundant,this paper developed a pause gene prediction method using genome and Ch IP-seq data,which named tc PIC(Classifier to identify transcriptional pause genes based on trans-acting factors and cis-acting elements).First,the GRO-seq data and analysis were used to identify transcription pause genes and transcription pause related motifs in the hela cell line.Then combined with the Ch IP-seq data of NELF and DSIF,and use motif frequency,matching score,and sequencing abundance of transcription factors as features.The machine learning classification models of logistic regression and support vector machine were constructed respectively to classify the transcriptional pause state of genes.In the case of using only the motif features,the accuracy rates of 75.8% and 76.8% were obtained respectively;in the case of using only the transcription factors feature,the accuracy rates were respectively 86.1%and 86.5%;in the case of combined using of motifs and transcription factors as features,the accuracy rates reach 87.5% and 88.1% respectively.Therefore,in the actual study of transcriptional pause genes,gro PIA or tc PIC can be selected for identification and analysis of transcriptional pause genes based on the availability of data.
Keywords/Search Tags:transcriptional pause, GRO-seq, breast cancer, predictive model
PDF Full Text Request
Related items