Research On Intelligent Computing-based Methods For Protein-peptide Binding Prediction

Posted on:2024-02-19

Degree:Master

Type:Thesis

Country:China

Candidate:R H Wang

Full Text:PDF

GTID:2530306923455964

Subject:artificial intelligence

Abstract/Summary:

PDF Full Text Request

Protein-peptide interactions play an important role in understanding protein function and exploring drug discovery.Identifying protein-peptide binding sites is crucial for a deeper understanding of these interactions.With the development and widespread application of machine learning,predicting protein-peptide interactions and their binding sites using computational methods has become one of the important directions in biological research.However,existing computational methods for predicting protein-peptide binding sites mostly rely on third-party tools for hand-crafted feature design,which can lead to lower computational efficiency and poorer prediction performance.Moreover,there is currently no method that can simultaneously predict protein-peptide binary interactions and their binding sites.In addition,the development of next-generation sequencing technologies has brought about massive amounts of biological sequence data,including genomes,transcriptomes,and proteomes.To explore and analyze the relationship between biological sequences and their functions and structures,numerous machine learning-based biological sequence analysis platforms have been developed.However,these platforms mostly lack an automated workflow,offer few deep learning frameworks,and provide limited result analysis.Therefore,a completely deep learning-based one-stop-shop biological sequence function analysis platform is highly desired for researchers without a computer science background.This article focuses on protein-peptide binding data and high-throughput biological sequence data and conducts the following research based on interaction and site prediction and biological sequence function analysis platform construction:1.In light of the intricate feature design problem encountered by existing computational methods for protein-peptide binding site prediction,we propose the PepBCL method,an endto-end prediction method based on a large-scale pre-trained model and contrastive learning.Specifically,we introduce a pre-trained protein language model on a large-scale dataset to automatically extract and learn high-quality representations related to protein structure and function.By using a contrastive learning strategy,we optimize the feature representation of binding sites in imbalanced datasets.On benchmark datasets,our proposed PepBCL method outperforms existing sequence-based methods in all indicators and outperforms structure-based methods in most indicators.Furthermore,we explore the attention mechanism in PepBCL for the ability to mine features of the residue sequence around binding sites in protein binding regions,providing an explanation of how the model predicts binding sites.2.Addressing the challenges posed by the inability of existing methods to simultaneously predict binary interactions between proteins and peptides and their binding sites,we present the CPPIF method based on multi-task learning strategy.The multi-task learning training strategy enables mutual assistance between different tasks,thereby improving prediction performance compared to single-task training.To learn the interaction information between proteins and peptides,we innovatively put them together into a pre-trained model for feature representation learning.The experimental results on a benchmark dataset demonstrate that our method outperforms existing methods in predicting protein-peptide binary interactions and binding sites.Additionally,we explore the model’s ability to mine potential protein-peptide interactions and validate it through molecular docking experiments.3.To overcome the limitations of current biological sequence analysis platforms in deep learning methods and result analysis,we introduce DeepBIO,a deep learning computational platform for high-throughput biological sequence functional analysis.DeepBIO supports 42 deep learning algorithms,enabling researchers to choose the appropriate algorithm based on biological problems and automatically conduct model training,optimization,and evaluation.Furthermore,DeepBIO provides comprehensive result visualization analysis,including model interpretability,feature analysis,and functional sequence region discovery.In addition,DeepBIO also supports 9 functional site annotation tasks and uses detailed visualization analysis to validate the reliability of annotated sites.

Keywords/Search Tags:

Protein-peptide binding site prediction, Protein-peptide interaction prediction, Large-scale pre-trained model, Machine learning, Biological sequence analysis platform

PDF Full Text Request

Related items

1	Sequence-based Prediction For The Protein-peptide Binding Residues
2	Research On Feature Extraction Algorithm Of Functional Peptide Prediction Problem Based On BERT Pre-trained Model
3	A Study On Specific Protein-Peptide Interaction Prediction
4	Analysis And Prediction Of Rna-binding Residues In Protein Molecules
5	Intelligence Algorithms For Protein Structure Prediction And Nucleic Acids Binding Site Annotation
6	Research On Protein-ligand Binding Sites Prediction Based On Sequence Information
7	A Hierarchical Mixture Model For Predicting Protein Signal Peptide
8	Prediction Of Protein Secondary Structure And Interaction Based On Machine Learning
9	Prediction Of Antibody Fc Fragment Binding Peptides Based On Machine Learning
10	Research On Protein Function Prediction Algorithm Based On Network Analysis