Font Size: a A A

Deep Learning-based Approach To Identify Enhancer-promoter Interactions

Posted on:2022-10-16Degree:MasterType:Thesis
Country:ChinaCandidate:M LangFull Text:PDF
GTID:2480306740959999Subject:Bio-engineering
Abstract/Summary:PDF Full Text Request
Enhancers are very important gene regulatory elements in biological genomes,which regulate the transcription of target genes in a tissue-specific and spatially specific manner.Accurate identification of patterns in three-dimensional genomes,especially enhancer-promoter interactions(EPIs),is important for deciphering gene regulation,cell differentiation and disease mechanisms.Traditional experimental methods have led to difficulties in widespread application due to drawbacks such as low throughput and expensive experiments.In recent years,researchers have developed a range of computational methods to identify EPIs,but these methods are generally difficult to be widely used to identify EPIs due to limitations in the cellular specificity of EPIs and the difficulty of obtaining cell line data.To overcome the problems of these methods,we developed a deep learning model for identifying EPIs in cell lines,which consists of a pre-trained DNA sequence word vector and a Convolutional Neural Network(CNN)model in two parts.We first propose a pre-trained model called seq BERT to represent the DNA sequence information,through which we are able to obtain a word vector representation of the DNA sequence,and a visual analysis of the model shows that the model is able to capture the implicit patterns of the DNA sequence.The pre-trained word vector representation of DNA sequences was then used as input to the CNN model,and a model called BERT-CNN was developed to identify EPIs in cell lines.highest AUROC(all values above 0.9)and AUPR values,indicating superior predictive performance.Furthermore,we also found that the DNA word vectors extracted from the seq BERT pre-training model substantially improved the prediction performance of the EPIs identification model BERT-CNN,suggesting that our pre-training model can be used for sequence-related tasks and that pre-training techniques can be used to analyse biological sequences.
Keywords/Search Tags:enhancer-promoter interactions, DNA sequences, convolutional neural networks, deep learning, pre-training technique
PDF Full Text Request
Related items