Font Size: a A A

Deep Learning-based Prediction Of Open Chromatin Regions And Interactions

Posted on:2023-05-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y ShenFull Text:PDF
GTID:1520306842964239Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Transcription and replication of eukaryotic chromatin require the combination of specific cis-regulatory elements and trans-acting factors.At the same time,these transacting factors can only bind to the low-folded,nucleosome-free region of DNA known as the open chromatin region(OCR)or chromatin accessible region.OCR is important for understanding the transcriptional regulatory mechanisms of the genome because it is a structural feature rich in regulatory elements.In recent years,based on massive biological data,deep learning has produced fruitful research results in the identification of genomic functional sites and the prediction of chromatin interactions.However,existing OCR prediction algorithms are mostly based on human or model animal data,and a computational method for identifying and predicting the OCR of plant genomes is still lacking.Meanwhile,unlike traditional chromatin interaction prediction methods,which are typically used to predict a single type of interaction,such as enhancerpromoter interaction,exploring interactions between OCRs is not constrained by the types of anchors,allowing for the discovery of more types of interactions.In response to the aforementioned problems and challenges,this paper develops two computational tools based on deep learning algorithms for the identification of OCR in plant genomes and the prediction of interactions between OCRs in human genomes.In the first work,this paper develops CharPlant(Chromatin Accessible Regions for Plant),which can predict all potential OCRs in a given plant genome genome-wide.A new convolutional neural network(CNN)architecture was designed and built for this purpose,and the model was trained and tested using ATAC-seq and DNase-seq datasets of four plants(Oryza sativa,Arabidopsis thaliana,Medicago truncatula,and Solanum lycopersicum).To determine chromatin accessibility,the model simultaneously learns sequence DNA sequence motif features and their regulatory logic.Furthermore,all computational steps are integrated into the CharPlant toolkit and can be run using a simple command line,and the method’s predictive ability and computationally efficient are confirmed by the subsequent data analysis results.Overall,the CharPlant tool maps the genome-wide distribution of OCRs in plants and can help researchers in explore OCR transcriptional regulatory mechanisms under various conditions.In the second work,CharID(Chromatin Accessible Region Interaction Detector)was developed in this paper to predict all potential interactions between OCRs in the human genome.To that end,in this paper,a two-step model was designed and built,and the model was trained and tested using datasets from three human cell lines(GM12878,K562,and He La-S3).The first step model of CharID is named CharIDAnchor,which is based on DNA sequence,and uses CNN and bidirectional-gated recurrent unit(Bi GRU)to build a hybrid CNN-Bi GRU neural network architecture based on attention mechanism.The sequences can be divided into anchor OCR and non-anchor OCR by learning the OCR features involved in the interaction.Following analyses,it was discovered that anchor OCR had more features that were beneficial for the establishment of interactions than non-anchor OCR.The second-step model of CharID is named CharID-Loop,and it uses a gradient boosting decision tree(GBDT)and a chromosome-split strategy to predict potential interactions between OCRs based on sequence features as well as epigenome and gene expression data.A comparison of CharID-Anchor and CharID-Loop with existing algorithms revealed that both have better performance and can predict more biologically significant chromatin interactions.Subsequently,a regulatory network of interactions between OCRs was built,and hubs rich in regulatory elements were identified based on this network.Besides that,this paper identifies SNP-target gene interactions linked to cardiovascular disease and explains the mechanisms by which SNPs regulate the GFOD1 gene.To further extend the usability of CharID,this paper develops Peaksniffer,an easy-to-use webserver that allows users to predict,retrieve,and visualize OCR interactions online.To summarize,this paper uses deep learning-related methods to develop two genome-wide OCR computational tools,CharPlant and CharID,an in-depth investigation and analysis of OCR’s one-dimensional functional genomic sites and its interaction in three-dimensional space,providing new perspectives and insights into OCR and its related regulatory mechanisms.
Keywords/Search Tags:open chromatin region, deep learning, convolutional neural network, three-dimensional genomes, chromatin interactions
PDF Full Text Request
Related items