Font Size: a A A

Analysis And Prediction Of Replication Origins In Saccharomyces Cerevisiae Genomes

Posted on:2021-04-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:D WangFull Text:PDF
GTID:1480306548475604Subject:Biophysics
Abstract/Summary:PDF Full Text Request
A complete and accurate DNA replication process is essential for living organisms.The specific site where DNA replication initiates and double-stranded DNA starts unwinding is termed replication origin(ORI).The characterization and identification of ORIs has been a critical issue,which is helpful to elucidate the molecular mechanism of DNA replication.With the rapid development of sequencing technology,the sequence data has grown exponentially.How to mine valuable information,build models,and discover rules from massive data have gradually become the main content of bioinformatics research.In this thesis,the sequence characteristics of the replication origins of Saccharomyces cerevisiae were analyzed from the perspectives of the whole genome level,the population genomics level,and the level of yeast inter-species genomes.Based on the Z curve methodology combined with the machine-learning method,a novel prediction pipeline named Ori-Finder 3 was built for predicting the potential replication origins of S.cerevisiae genome-widely and additionally a database for predicted replication origins was constructed.In the first part,the conservation analysis of the replication origins from both genome-wide level and population genomics level was described.The result showed that94.32% of ARSs(Autonomously replicating sequences)are unique across the whole genome of S.cerevisiae S288 C and those with high sequence similarity are prone to locate in subtelomeres.A large-scale comparison with ORIs of the diverse budding yeast strains from a population genomics perspective was conducted.The result showed that82.5% of ARSs are not only conserved in genomic sequence but also relatively conserved in chromosomal position.The non-conserved ARSs tend to distribute in the subtelomeric regions.A pan-genome analysis of ARSs among the S.cerevisiae strains was conducted,and a total of 183 core ARSs existing in all yeast strains were determined.Genes adjacent to replication origins among the S.cerevisiae population were extracted,the result showed that genes involved in the initiation of DNA replication,such as orc3,mcm2,mcm4,mcm6 and cdc45,are conservatively located adjacent to the replication origins.Furthermore,the result showed the genes adjacent to conserved ARSs are significantly enriched in DNA binding,enzyme activity,transportation and energy,whereas for the genes adjacent to non-conserved ARSs are significantly enriched in response to environmental stress,metabolites biosynthetic process and biosynthesis of antibiotics.In the second part,the evolution analysis of the replication origins was described.The result showed that among the representative strains of 26 clads from 1,011 S.cerevisiae strains,Chinese-related strains show the most genetic diversity.The principal component analysis of SNP(Single nucleotide polymorphism)of replication origin region among 1,011 S.cerevisiae strains was conducted,the result showed that nonChinese-origin strains were clustered together,which support a single ‘out-of-China'origin for this species.Based on the homologous ARSs among the S.cerevisiae population,the phylogenetic tree of replication origins was built;the result showed that strains with different ecological and geographic origin clustered in divergent caldes.Subsequently,we adopted the comparative genomics analysis among the Saccharomyces inter-species and extracted the homologous replication origins in the multiple sequence alignments.The result showed that the species with longer evolutionary distance compared with the species of S.cerevisiae contains more SNP frequency,and these SNPs are not evenly distributed.For some ARS sequences,mutations in homologous ACS motif among different yeast species might responsible for differences in replication time profiles.In the third part,a novel pipeline for computational prediction of ARSs in the S.cerevisiae genomes was described.Firstly,ACS motif scanning and AT-rich sequence segmentation based on the Z-curve methodology were conducted to identify the candidate ARSs.Subsequently,a machine learning method was used for the filtration of the candidate ARSs.Consequently,a user-friendly web server,Ori-Finder 3,for the computational prediction of replication origins in S.cerevisiae at the genome-wide level was developed,which can be freely accessed at http://tubic.tju.edu.cn/Ori-Finder3.In addition,Ori-Finder 3 was used to predict replication origins among a hundred S.cerevisiae genomes,and these predictions were collected for constructing the ARS database,which could provide useful data for further sequence features mining.
Keywords/Search Tags:DNA replication, Replication origin, Saccharomyces cerevisiae, Autonomously replicating sequence, Genome-wide analysis, Population genomics, Saccharomyces sensu stricto, Evolutionary conservation
PDF Full Text Request
Related items