Font Size: a A A

Prediction Of Enhancer-Promoter Interactions Based On Deep Learning

Posted on:2022-09-16Degree:MasterType:Thesis
Country:ChinaCandidate:C M YeFull Text:PDF
GTID:2530306323472104Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In computational biology,it is important to accurately identify the three-dimensional organization of the genome.Predicting enhancer-promoter interactions(EPIs)task,for example,is important for understanding gene regulation and grasping the mechanism of a disease.In recent years,computational methods based on machine learning have been widely used in predicting EPIs because of their good performance.Although existing models have made some achievements,there are still some problems.For example,model cannot learn more sequence information;The structure of model is simple and it is difficult to extract more effective features;Model cannot capture more interaction information between the sequences.In order to solve these problems,EPIs are predicted based on deep learning in this paper,which can improve the prediction accuracy and training speed of the model.The main research contents are as follows:First,we propose a new model named EP2bert to predict enhancer-promoter interaction.EP2bert first pre-training on human genome based on BERT language model to learn the feature representations of DNA sequences,then extracted the feature representations of promoter sequences and enhancer sequences respectively,and trained GBRT classifier to predict EPIs in supervised learning.A baseline dataset(six cell lines)was used to evaluate the performance of EP2bert against existing methods.Comparative results show that EP2bert is competitive with the existing models in multiple cell lines,and computational speed is faster.Second,we propose a novel method,termed EPI-DLMH,for predicting EPIs with the use of DNA sequences only.EPI-DLMH consists of three major steps.First,a twolayer convolutional neural network is used to learn local features,and an bidirectional gated recurrent unit network is used to capture long-range dependencies on the sequences of promoters and enhancers.Second,an attention mechanism is used for focusing on relatively important features.Finally,a matching heuristic mechanism is introduced for the exploration of the interaction between enhancers and promoters.We use benchmark datasets in evaluating and comparing the proposed method with existing methods.Comparative results show that our model is superior to currently existing models in all cell lines.All models proposed in this paper have excellent performance and different advantages.EP2bert can learn the global context information of DNA sequence based on BERT.EPI-DLMH can extract more effective features and sequence interaction information through hybrid neural network and matching heuristic.
Keywords/Search Tags:sequence prediction, pre-training, matching heuristic
PDF Full Text Request
Related items