| Accurate identification of DNA enhancer activity is important for understanding gene expression,biological development and drug research,etc.DNA enhancer sequences contain DNA fragments,called transcription factor binding sites,that can bind to transcription factors.In this paper,a cross domain transfer learning method is proposed,called Res Nest-STARR,for extracting a priori knowledge from the transcription factor binding site training process to predict DNA enhancer activity.The main work is as follows:(1)In terms of biological features,more valuable biological clues for the enhancer activity prediction task are provided.Based on One-hot encoding for DNA sequences,molecular dynamics features and electrostatic potential energy features of DNA are introduced for characterizing the spatial structure information of DNA.The contribution analysis of the features confirms that the two features have different effects on enhancer activity,among which the molecular dynamics features are more biologically significant for the enhancer activity prediction task.(2)In terms of network architecture,a high-precision deep learning network framework is provided,called Res Nest-STARR.This network framework utilizes the Res Nest block consisting of splitting module,channel attention mechanism and residual structure to capture the interaction between features and multi-channel representation.Meanwhile,three optimization strategies are designed,including Huber loss,loss weight adjustment and preheated learning rate,and demonstrates through experiments that the combined application of the three optimization strategies can improve the generalization ability of the model.(3)In terms of learning strategies,generalized guidelines for cross-biological component prediction are proposed.In this paper,a cross-domain transfer learning approach is used for the first time to apply prior knowledge learned from transcription factor binding sites to the task of predicting Drosophila DNA enhancer activity.This approach provides information about the potential features in transcription factor binding sites,resulting in a substantial improvement in the performance of the Res Nest-STARR model in the benchmark dataset,with Pearson correlation coefficients of 69.1% and 77.2%in the developmental enhancer and housekeeping enhancer activity prediction tasks,respectively,which are 2.2% and 2.1% higher compared to the state-of-the-art method Deep STARR,respectively.In addition,Res Nest-STARR outperformed the current stateof-the-art method in both artificial mutation and synthetic enhancer activity prediction tasks in Drosophila cells.(4)In terms of biological applications,the Res Nest-STARR method are successfully extended to human cis-regulatory elements.By further migrating Res Nest-STARR to human enhancer activity and human promoter prediction tasks,the performance of the prediction achieved was superior to the current state-of-the-art methods Deep STARR and i Pro-WAEL.The experimental results fully illustrate the effectiveness and scalability of the Res Nest-STARR method,it also shows that the proposed method not only provides new ideas for studying the similarities and differences of biological components across species,but also provides a theoretical basis for understanding biodiversity and species evolution. |