Font Size: a A A

Research On Prediction Of Protein-Protein Interactions Based On Deep Neural Network And Ensemble Method

Posted on:2020-07-12Degree:MasterType:Thesis
Country:ChinaCandidate:M Y WangFull Text:PDF
GTID:2370330575981212Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Protein is one of the material bases of life.Protein-protein interactions(PPIs)control almost all cellular processes and play critical roles in various physiological functions.Studying the interactions between proteins is not only valuable in understanding mechanism of biological processes,but also important for analysis on drug design and the emergency and spread of disease.Since traditional experimental methods are time-consuming and have limit in the number of proteins,there is a growing demand for computational methods for they are usually convenient and flexible.Among the methods,prediction of PPIs based on machine learning has been drawing much attention.In the post-genome era,genomic data including gene sequence and so on are resource-rich,which lays foundation for the extensive use of prediction of PPIs based on machine learning methods.So far,many relevant methods have been developed.However,with the development of machine learning algorithms,the performance of new models is obviously improved while many of them are not widely applied yet in predicting PPIs.Moreover,the great variety of existing data makes it difficult to develop standards for PPI prediction which can make full use of it.In addition,there is usually a relationship of mutual influence between different feature extraction methods and classifiers so that the data resource and feature extraction method often have to be changed when replacing the classifier.Consequently,two prediction methods were developed in this paper,including predicting PPIs based on deep neural network and predicting based on ensemble method.The prediction method based on deep neural network integrated protein features including sequence similarity,essentiality,subcellular localization information and GO term semantic similarity from diverse databases.The formed vector had a high level of abstraction and low dimensions.Then,a deep neural network was constructed to automatically learn information from the features.The experimental results showed that the extracted features were suitable for the prediction of PPIs which could not only reduce time spent in learning and predicting but also improve the predicting accuracy.Moreover,the neural network we constructed also had good generalization capability.Since various protein features have to be integrated in the prediction of PPIs based on deep neural network,the labor costs are high before the experiment.In addition,there are some proteins that lack relevant information and cannot be encoded as a vector.Compared with other information,protein sequence has characters of high accessibility and resource abundance.Therefore,a predicting method based on continuous wavelet transform(CWT)and ensemble learning was proposed as a supplement in this paper.Various physicochemical properties of amino acids were used to transform each protein sequence into numeric vectors.Then CWT and wavelet power spectrum were used to extract equilong feature vectors from sequence information.At last,seven random forest(RF)classifiers were used to predict PPIs.The experimental results showed that the proposed method could achieve high performance based on various datasets and was worth further exploration.
Keywords/Search Tags:protein-protein interaction, protein features, protein sequence, deep neural network, continuous wavelet transform, ensemble method
PDF Full Text Request
Related items