Font Size: a A A

Analysis Of Transcription Initiation Sites At Different Time Points Of Human Cytomegalovirus Infection In HELF Cells

Posted on:2022-04-01Degree:MasterType:Thesis
Country:ChinaCandidate:X J MengFull Text:PDF
GTID:2480306563956439Subject:Basis of Obstetrics and Gynecology
Abstract/Summary:PDF Full Text Request
Objective: To analyze the characteristics and differences of transcription initiation sites after HCMV Han-BAC strain infection of human embryonic lung fibroblasts at 12 hpi and 72 hpi by CAGE-seq(Cap analysis of gene expression and deep sequencing).Methods: HELF cells were infected with HCMV Han-BAC strain at MOI=5,and then total RNA was harvested at 12 hpi and 72 hpi respectively,CAGE technology combinating with next-generation sequencing was applied to globally capture TSSs,of which characteristics and differences between 12 hpi and 72 hpi were obtained.Results: Total RNA was prepared and the data analyzed after DeepCAGE-seq.1)Human CAGE tags were mapped to the human genome(hg38/GRCh38)using Bowtie2 allowing 2 mismatches,results showed tags was heavily clustered,favoring the 5' UTR and the 5' end of the gene,as expected.At the same time,the previously confirmed TSSs of HCMV Han-BAC were highly confirmed again by CAGE TCs,indicating the credibility of our HCMV Han-BAC CAGE library.2)The TSSs obtained were mapped to the HCMV Han genome(Gen Bank: KJ426589.1),and the results showed that the HCMV Han-BAC TCs were widely distributed in the genome,the increased background of TSSs was more noticeable during late infection and was likely due to pervasive transcription.TPM of HCMV Han-BAC TCs to analyze is defined to at least 5,with the reference of the abundance of HCMV Han-BAC TSS confirmed previously.We classified TCs of TPM > 5 into 7 classes based on their location in relation to confirmed TSS,TTS and ORF,the number and average expression abundance of TCs in each category were counted.The results showed that HCMV Han-BAC TCs were widely distributed in the genome,with significant aggregation of TCs in the CDS and antisense,and the abundance of TCs of upstream of the ORF,was significantly higher than other classes.3)The TPM were greater than 207.79 and 23.43 in the HELF 12 h and HELF 72 h library corresponding to 100 CAGE tags respectively,which in accordance with previous studies.The distribution was categorized according to the number of TCs and decile spacing at different ranges of TPM.The results showed that in the HELF 12 h library,TCs clustered significantly at a decile spacing of 1 nt for TPM values between 207.79 and 500,and 1-8 nt for TPM values above 500;in the HELF 72 h library,TCs clustered at a decile spacing of 1 nt,1-5nt,1-5nt at TPM values between 23.43 and 100,100 and 500,and above 500 respectively.These clustered TCs represent a class of TSSs with a defined base position or a small range,and are named single dominant peak(SP)TCs.The TCs were further analyzed to obtain broad with a single dominant peak(DP),broad with bior multi-peak(MP),and low abundance with no dominant peak generally.In the HELF12 h library,92(36.7%),92(36.7%),42(16.7%),and 25(9.9%)TCs were classified into SP,DP,MP,and GB respectively,and 116(17.8%),94(14.4%),and 43(9.9%)TCs and398(61.1%)TCs were classified as SP,DP,MP,and GB in the HELF 72 h library respectively.4)The highest abundance of TSS in TCs were named the dominant TSS and their positions were used to represent the position of the TC.The sequences of ±50nt flanking the dominant TSS were defined as the core promoter,and MEME was used for motif analysis,Tom Tom was used to comparing new motif with the human genome motif database.92 SP core promoter sequences in the HELF 12 h library were performed to identify a significantly motif TSTATAWAAR(E-value 5.1e-021),and 14 motifs were significantly matched to the target motif in human motif database,including TBP(P-value 5.84e-06,E-value 2.34e-03,q-value 4.68e-03).116 SP core promoter sequences in the HELF 72 h library were performed to identify a significantly motif TATWWAA(E-value 2.5e-006),14 motifs were significantly matched to the target motif in human motif database,including TBP(P-value 2.15e-04,E-value 8.63e-02,q-value 1.73e-01).We failed to identify any significant motif in the core promoter of typical DP,MP and GB TCs.5)Based on the location of 107 previously confirmed TSSs of HCMV Han-BAC genes,it can be inferred that the majority of TSSs located in the-500 to +100region of the ORF.If the ration between the highest peak and the secondary peak was >2within the-500 to +100 region,the core promoter of gene was classified into single-dominant promoter,otherwise it was clasified into multi-dominant promoter. When TPM of the primary peak in this region is less than 20,it was classified into low abundance with no dominant peak.The differences of TSS within the-500 to +100 region of the same gene were divided into two categories between HELF 12 h and 72 h. 108 genes showed no change in TSS distribution and dominant peak position but only in abundance.19 genes showed significant changes in TSS distribution and dominant peak position.6)Data analysis revealed a certain number of newly unconfirmed TCs in the CDS region.A total of 25 genes had TCs with a TPM greater than 100 within the CDS region and the distance to 3' end of the gene were more than 300 nt.Conclusions: The high quality of our HCMV Han-BAC CAGE libraries were constructed by CAGE-seq.Highly abundant TCs were clustered in the upstream of ORF and were classified into four categories according to the aggregation pattern of TSSs within the TCs.Conserved motifs were found in the core promoters of SP TCs in both HELF 12 h and HELF 72 h library.The distribution of TSS in the-500 to +100 region of the same gene was divided into three categories,and the differences in TSS in the-500 to+100 region of the same gene were divided into two categories between HELF 12 h and72h.108 genes showed no change in TSS distribution and dominant peak position but only in abundance.19 genes showed significant changes in TSS distribution and dominant peak position.A total of 25 genes had TCs with a TPM greater than 100 within the CDS region and the distance to 3' end of the gene were more than 300 nt.
Keywords/Search Tags:human cytomegalovirus, CAGE-seq, transcription start site, core promoter, transcription regulation
PDF Full Text Request
Related items