| Protein usually perform functions by physically contacting one or more protein(s),forming protein-protein complexes.Protein-protein interactions(PPIs)play crucial roles in cellular systems such as transcription,translation,and metabolism regulation.Therefore,the identification of PPI sites is of great importance for understanding many biological processes as well as the molecular mechanisms of disease pathogenesis,which contributes to the development of new therapeutics and the design of new drugs.Hot spots are compact,centralized regions of residues that are crucial for the affinity of protein-protein interactions and contribute most to the binding energy,which can be identified by experimental methods,such as alanine mutagenesis scanning.Therefore,targeting hot spots at protein interfaces has enormous potential for the development of therapeutics and drugs.To detect PPI sites,various experimental and computational methods have been developed.Conventionally,experimental methods,such as yeast two-hybrid(Y2H)screening and affinity-purification mass spectrometry(APMS),are often time-consuming and very expensive.Therefore,the development of reliable computational methods is urgently needed.Currently,most PPI site prediction tools are machine learning-based approaches.According to the information they are based on,the existing prediction methods can be divided into two strategies:sequence-based and structure-based.For sequence-based methods,only protein sequence information is used to predict PPI sites.Structure-based approaches identify PPI sites by using both sequence and structure information.In comparison,structure-based methods show much better performance and greater potential for practical applications as the number of structurally resolved proteins rapidly grow.Moreover,with the rapid development of protein structure prediction algorithms,such as AlphFold2 and RoseTTAFold,the barrier between sequence information and structure information has been gradually breaking down,and therefore,it is a trend to study proteins based on structures.In this study,we introduce Spatom,a novel structure-based PPI site prediction algorithm.(ⅰ)Spatom is developed based on the theory that whether an amino acid is a PPI site is determined by both the spatial local environment(SLE)around it and the spatial global environment(SGE)of the whole protein.(ⅱ)Spatom transforms a protein into a digraph(the residue contacting graph)and treats the prediction of PPI sites as a graph node classification problem.Each edge on the digraph is weighted based on the spatial distance of the two corresponding amino acids,and the structural,evolutionary,and physicochemical characteristics information is extracted to form a feature vector for each node.(ⅲ)Spatom employs the graph convolution as its main unit,which effectively aggregates both the SLE and SGE information into the nodes of the weighted digraph.Different from traditional graph convolution on unweighted undirected graphs,a novel graph convolution architecture is specially designed for the weighted digraph.(ⅳ)The improved graph self-attention layer is applied at the end of the network,which can effectively alleviate the oversmoothing of graph convolutions when performing the function of the attention mechanism.The distance-based weighted digraph is able to provide a reasonable portrayal of the spatial contacts between residues.Furthermore,self-attention layer further drives the predicted interaction sites to form a spatially continuous region,which lead to more biologically plausible residues captured by Spatom and improve the performance.We benchmarked Spatom on three experimentally resolved protein structure datasets containing 186,72,and 164 proteins,and the results showed that this framework substantially improves the performance over other structure-based and sequence-based methods.In addition,we used AlphaFold2 to generate predicted protein structures of the proteins in the testing datasets and evaluated Spatom based on predicted structures.Spatom also demonstrates much better performance than most prediction methods.Finally,we used the trained Spatom to study the interaction interface of the following proteins:the protease of HIV-1,which is the causative agent of AIDS,the spike protein of SARS-CoV-2,which is responsible for COVID-19,and P53 and MDM2,which are associated with tumour generation.Spatom has been proven to be very sensitive to hotspots from our case studies,demonstrating its great potential for guiding the development of inhibitors targeting the binding of proteins. |