Font Size: a A A

Research Of Protein-protein Interactions Prediction Based On Deep Learning

Posted on:2020-08-26Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhangFull Text:PDF
GTID:2370330599956771Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Proteins are the molecular basis of life and play crucial roles in the cell life activities.Most proteins interacts with other proteins to accomplish diverse biological functions.Protein-protein interactions(PPIs)is one of the hotspots and difficulties in proteomics research.Identifying true protein-protein interactions not only provides great insight for understanding the biomolecular physiological processes of living cells,but also is very important for the development of drugs and the exploration of disease mechanisms.Traditional wet-lab based techniques for detecting PPIs are time-consuming,small coverage,and expensive.In recent years,have been developing computational approaches that use machine learning and protein amino acid sequences for recognizing PPIs.However,these approaches have some shortcomtings: 1)Numeric encoding methods of amino acid sequences can not fully extract the information of interactions.2)Ignore the complementary information between descriptors and classifiers,in other words,only using single descriptor and single classifier for predicting protein-protein-interactions.3)Dataset of protein-protein non-interactions contains noise.To address these problems,this thesis works toward computational PPIs prediction and makes achievements as follows:(1)To sovle the problem of numeric encoding methods of amino acid sequences that can not fully extract the information of interactions,we proposed a sequence-based method called DNN-LCTD,which combines a novel local conjoint triad descriptor(LCTD)and deep neural networks(DNNs),for PPIs prediction.LCTD incorporates the advantage of local descriptor(LD)and conjoint triad descriptor(CT),thus,it can better account for interactions between residues in both continuous and discontinuous regions of amino acid sequences.DNNs can not only learn suitable features from the data by themselves,but also discover hierarchical representations of data.By applying DNN-LCTD on the PPIs data of Saccharomyces cerevisiae,it achieves a superior performance with Accuracy as 93.12%,Precision as 93.75%,Area Under the receiver operating characteristic Curve(AUC)as 97.92%,and it only needs 718 seconds.Experimental results show that DNN-LCTD can efficiently and accurately predict protein-protein interactions and LCTD is superior to other encoding algorithms for amino acid sequences.(2)To solve the problem that existing computational solutions ignore the complementary information between amino acid descriptors and PPI prediction classifiers,we proposed a method called EnsDNN(Ensemble Deep Neural Networks)for detecting PPIs based on ensemble strategy.EnsDNN separately uses auto covariance descriptor,local descriptor,and multi-scale continuous and discontinuous local descriptor,to represent and explore the pattern of interactions between sequentially distant and spatially close amino acid residues.It then trains deep neural networks(DNNs)with different configurations based on each descriptor.Next,EnsDNN integrates these DNNs into an ensemble predictor to leverage complimentary information of these descriptors and of DNNs,and to predict potential PPIs.EnsDNN achieves a superior performance with accuracy of 95.29%,recall of 95.12%,and precision of 95.45% on predicting PPIs of Saccharomyces cerevisiae.(3)The protein-protein non-interactions data generally contains noise.To address this problem we proposed two novel approaches NIP-SS and NIP-RW for generating high-quality non-interaction pairs.NIP-SS and NIP-RW select protein-protein non-interacting pairs based on sequence similarity and random walks on graph,respectively.NIP-SS firstly calculates the sequence similarity between proteins in protein-protein interactions dataset,and then selects the top-m dissimilar protein pairs as negative examples and controls the degree distribution of selected proteins similar to the distribution of proteins in positive set,which includes interaction pairs.NIP-RW performs random walk on the PPI network to update the adjacency matrix of the network,and then selects protein pairs not connected in the updated network as negative samples.To account for the efficiency,we used AC to extract the information of amino acid sequence,and employed DNNs as the classifier.Experimental results prove that the negative dataset generated by NIP-SS and NIP-RW can reduce the bias and have good generalization ability.In addition,the model based on the negative dataset generated by NIP-SS and NIP-RW yield a better performance for PPIs prediction than the negative dataset generated by other strategies.
Keywords/Search Tags:Protein-protein Interaction, Deep Learning, Amino Acids Sequences, Numeric Encoding Methods of Amino Acids, Negative Datas
PDF Full Text Request
Related items