Font Size: a A A

Prediction Of Meiotic Recombination Sites Based On Multi-source Information Fusion

Posted on:2022-04-07Degree:MasterType:Thesis
Country:ChinaCandidate:S J SongFull Text:PDF
GTID:2480306515971169Subject:Biology
Abstract/Summary:PDF Full Text Request
Meiotic recombination is a homology-dependent genetic material reorganization which occurs between non-sister chromatids of homologous chromosomes during meiosis ?in eukaryotic cells.Meiotic recombination is an important source of genetic diversity,and the cross-over formed in the process of meiotic recombination ensures the correct separation of chromosomes.Genomic regions with a high frequency of meiotic recombination are called “recombination hot spots”,and those with a low or no frequency of recombination are called “recombination cold spots”.The distribution of “hot spots” and“cold spots” on chromosomes is not random,but is influenced by DNA features such as DNA sequence,DNA structure and DNA shape,as well as non-DNA features like epigenetic signals and Top2 binding signals.Accurate characterization and prediction of recombination “hot spots” provide important implications to the molecular mechanism of recombination and the pathogenesis of relevant molecular genetic diseases.In this study,we constructed benchmark dataset based on the Saccharomyces cerevisiae recombination map determined by using Spo11-oligo-seq,analyzed the influence of DNA features and non-DNA features on recombination cold/hot spots,presented a computational model to classify recombination cold/hot spots by using Support Vector Machine(SVM),and determined the importance of features that can represent the ability to classify recombination cold/hot spots.The results showed that both DNA and non-DNA features could characterize and distinguish the recombination cold/hot spots to a certain extent.DNA feature-based prediction model achieved an accuracy of ?77.98%,and a model based on non-DNA features including H3K4me3,H3K56 ac,MNase-seq signal,and Top2-binding signal achieved an accuracy of 89.73%.The prediction accuracy of nonDNA features(about 90%)was higher than that of DNA features(about 78%).Then,we compared our model with three other published models.The prediction accuracy of the three existing models is about 52%,which is much lower than ours(?90%).The lower prediction accuracy is probably because their models were trained on ORF-based benchmark dataset,which cannot effectively classify the highly-reliable hot/cold spots defined on Spo11-oligo-seq data.We also constructed benchmark datasets based on the recombination map of human,and analyzed the influence of DNA features on human recombination sites.We predicted three benchmark datasets with sequence length of 5 kb and 2 kb respectively.The results showed that DNA features of larger scale genomic regions(e.g.,5 kb)are strongly correlated with recombination hot spots.The prediction accuracy of recombination hot/cold spots in human is about 70%,which is much lower than that in budding yeast.The reduced accuracy might be associated with the higher regulatory complexity of recombination in human.Our data showed that non-DNA features had a more significant effect on the distribution of recombination cold/hot spots in Saccharomyces cerevisiae than DNA features.The prediction performance of combined DNA features is better than those of single DNA compositional feature,DNA physical property feature,and DNA shape feature,and the accuracy of combined non-DNA features is higher than that of single non-DNA feature.The existing models trained on ORF-based benchmark dataset cannot discriminate with a high accuracy the hot/cold spots defined on high resolution Spo11-oligo-seq data.
Keywords/Search Tags:DNA sequence, DNA shape, Epigenetic signals, SVM
PDF Full Text Request
Related items