Font Size: a A A

Prediction Of Protein-protein Interactions Via Multi-modal Features Fusion

Posted on:2019-12-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y T WenFull Text:PDF
GTID:2370330566461596Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As one of the most important macromolecules in the “central dogma” of genetic and molecular sciences,proteins play a vital role in life.It can be said that there is no life without protein.Protein plays a functional role in pairs,so studying protein-protein interactions(PPIs)prediction can play a key role in revealing protein function.Constructing protein interaction network is also a hotspot and difficulty in protein research.The primary structure of protein is a sequence composed of 20 kinds of amino acids.Now it has a large amount of protein sequence information to provide the prediction of PPIs.At the same time,the artificial intelligence algorithm has been developed at a top speed,and based on computational methods have been studied on PPIs.The prediction and research of PPIs is a valuable and promising topic.Now there are many prediction models based on computational methods being applied to the prediction of PPIs.There are some shortcoming in the existing methods,such as too high dimensions,excessive redundancy,and performance to be improved.What is more important is that most methods only focus on single model,and no experts and scholars have effectively fused and encoded multi-modal features of protein and fully utilized multi-modal features of protein to further extract more discrimination features.Based on this,we propose a prediction model of PPIs based on multi-modal features fusion,including three parts as following:First,in stage of feature extraction,we apply three properties of the amino acids.The first is the amino acid mutation rate that exploits the two-dimensional linear discriminant analysis(2DLDA)method to put the protein sequence into block substitution matrix.The second is hydrophobicity which is extracted by the hydrophobicity index and continuous wavelet transform(CWT)method is then applied to produce a uniform size feature matrix.The third is the hydrophilicity which is extracted by the hydrophilicity index and the discrete wavelet transform(DWT)method that is used to produce a uniform size feature matrix.Through a large number of experiments,it has been found that the three protein feature extraction methods are effective.Second,we proposed a novel computational method based on the similarity network fusion(SNF)to fuse multi-modal features from the physical and chemical properties of protein and combine a label propagation algorithm(LPA)for PPIs prediction.The SNF approach integrates those across each similarity network to take advantage of the complementary features of multi-modal data.Both the amino acid mutation rate and hydrophobicity are simultaneously considered as the features of protein sequences.The experimental results show that the proposed method achieves promising performance and outperforms the existing methods.Third,we propose a multi-modal basic deep polynomial network(MDPN)algorithm to effectively integrate these properties and combine a regularized extreme learning machine(RELM)to predict PPIs.The three physicochemical properties of amino acids are amino acid mutation rate,hydrophobicity,and hydrophilicity.The MDPN consists of a two-stage DPN,the first stage is to put multi-modal protein features into DPN encoding to obtain high-level feature representation while the second stage is to fuse and learn features by cascading all three types of high-level features in the DPN encoding.Experimental results on different datasets and comparisons with state-of-the-art methods show that the proposed method is a powerful and robust method for PPIs.We validated the effectiveness of the two proposed method models using a five-fold cross-validation approach for the three protein core datasets of the H.pylori,Human,and Yeast datasets.To further verify its scalability,experiments were conducted on six protein cross-species datasets and three protein core networks.A large number of comparative experiments have verified that the two PPIs models proposed in this paper are more effective in the existing main current methods.
Keywords/Search Tags:PPIs prediction, multi-modal feature fusion, SNF, MDPN
PDF Full Text Request
Related items