Font Size: a A A

Research On Predicting Protein-protein Interactions Based On Machine Learning

Posted on:2022-11-08Degree:MasterType:Thesis
Country:ChinaCandidate:F S LiuFull Text:PDF
GTID:2480306608490044Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
Proteins play an important role in the cellular process of organisms,and its function depends on protein-protein interaction.Abundant protein-protein interaction information can promote disease treatment and drug research.Therefore,the accurate prediction of protein-protein interaction is of great significance.High throughout biological experiments can be used to predict new protein pairs,but the operation is expensive and time-consuming,which can't meet the needs of people for this kind of information.With the rise of machine learning algorithms and the increasing power of computer computing,using scientific computing model to predict interaction has become the first choice.In this paper,a binary classification model for predicting protein-protein interaction is designed based on protein sequence,which is studied from the following two aspects:(1)A Protein-Protein Interaction Prediction Model Based on SVM-SRC Probabilistic Fusion Method.Aiming at the limitation of single classifier and the influence of noise on prediction results,SVM-SRC probability fusion method is proposed.The SVM-SRC probability fusion method selects the support vector machine with strong generalization ability as the sub classifier,and uses the interval hyperplane to distinguish the difficult edge samples and easy to identify samples.For the edge samples without obvious category bias,the filtered reconstruction dictionary is used to train the sparse representation classifier,and the probability is used to fuse the two decision results to predict the target category.Firstly,aiming at the problems of cumbersome and complex feature extraction methods and single feature types,four coding methods of combination,transition,distribution and auto covariance are used to analyze the physical and chemical properties of amino acids.The corresponding amino acid residues of protein sequence are digitized to form a new protein sequence characterization model,which comprehensively considers the effects of various physical and chemical properties on protein interaction.Then,in the feature selection module,the feature importance of random forest is used to obtain the best feature subset.While reducing the dimension of high-dimensional original features,it is also helpful to understand the potential relationship between the physical and chemical properties of different amino acids and protein interaction.Finally,the parameters and thresholds of model are optimized,and the data are input into the probability fusion model to obtain the decision result.The experimental results show that SRC can be an effective supplement to SVM.On yeast data sets,human data sets and Helicobacter pylori data sets,the accuracy of 5-fold cross validation is 94.7%,97.12% and 88.53% respectively,which has high recognition accuracy.(2)A Protein-Protein Interaction Prediction Model Based on Deep Learning.In order to make full use of the advantages of big data,a deep learning framework for predicting protein-protein interaction is proposed.A pair of protein sequences are encoded and fed to the embedding layer,long-term and short-term memory neural network and neural network with one hidden layer respectively,and then the output vector is connected from head to tail and input into the full connection layer with two hidden layers.The unknown protein sequence pair is predicted by softmax function.The network structure can learn the short-range and long-range dependence between amino acid residues in order space,and can extract more abstract and complex features.The experimental results show that the accuracy of the deep learning framework in the 5-fold cross validation of human protein-protein interaction data set is 98.9%,and has high recognition accuracy.
Keywords/Search Tags:Protein-protein Interaction, Support Vector Machine, Sparse Representation Classifier, Probabilistic Fusion, Deep Learning
PDF Full Text Request
Related items