Protein-peptide interactions play an important role in understanding protein function and exploring drug discovery.Identifying protein-peptide binding sites is crucial for a deeper understanding of these interactions.With the development and widespread application of machine learning,predicting protein-peptide interactions and their binding sites using computational methods has become one of the important directions in biological research.However,existing computational methods for predicting protein-peptide binding sites mostly rely on third-party tools for hand-crafted feature design,which can lead to lower computational efficiency and poorer prediction performance.Moreover,there is currently no method that can simultaneously predict protein-peptide binary interactions and their binding sites.In addition,the development of next-generation sequencing technologies has brought about massive amounts of biological sequence data,including genomes,transcriptomes,and proteomes.To explore and analyze the relationship between biological sequences and their functions and structures,numerous machine learning-based biological sequence analysis platforms have been developed.However,these platforms mostly lack an automated workflow,offer few deep learning frameworks,and provide limited result analysis.Therefore,a completely deep learning-based one-stop-shop biological sequence function analysis platform is highly desired for researchers without a computer science background.This article focuses on protein-peptide binding data and high-throughput biological sequence data and conducts the following research based on interaction and site prediction and biological sequence function analysis platform construction:1.In light of the intricate feature design problem encountered by existing computational methods for protein-peptide binding site prediction,we propose the PepBCL method,an endto-end prediction method based on a large-scale pre-trained model and contrastive learning.Specifically,we introduce a pre-trained protein language model on a large-scale dataset to automatically extract and learn high-quality representations related to protein structure and function.By using a contrastive learning strategy,we optimize the feature representation of binding sites in imbalanced datasets.On benchmark datasets,our proposed PepBCL method outperforms existing sequence-based methods in all indicators and outperforms structure-based methods in most indicators.Furthermore,we explore the attention mechanism in PepBCL for the ability to mine features of the residue sequence around binding sites in protein binding regions,providing an explanation of how the model predicts binding sites.2.Addressing the challenges posed by the inability of existing methods to simultaneously predict binary interactions between proteins and peptides and their binding sites,we present the CPPIF method based on multi-task learning strategy.The multi-task learning training strategy enables mutual assistance between different tasks,thereby improving prediction performance compared to single-task training.To learn the interaction information between proteins and peptides,we innovatively put them together into a pre-trained model for feature representation learning.The experimental results on a benchmark dataset demonstrate that our method outperforms existing methods in predicting protein-peptide binary interactions and binding sites.Additionally,we explore the model’s ability to mine potential protein-peptide interactions and validate it through molecular docking experiments.3.To overcome the limitations of current biological sequence analysis platforms in deep learning methods and result analysis,we introduce DeepBIO,a deep learning computational platform for high-throughput biological sequence functional analysis.DeepBIO supports 42 deep learning algorithms,enabling researchers to choose the appropriate algorithm based on biological problems and automatically conduct model training,optimization,and evaluation.Furthermore,DeepBIO provides comprehensive result visualization analysis,including model interpretability,feature analysis,and functional sequence region discovery.In addition,DeepBIO also supports 9 functional site annotation tasks and uses detailed visualization analysis to validate the reliability of annotated sites. |