Font Size: a A A

Research On Prediction Of Protein-protein Interactions Based On SIFT Algorithm And Parallel Support Vector Machine

Posted on:2019-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:X F ShiFull Text:PDF
GTID:2428330566961895Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Nowadays,the detections of protein-protein interactions include biological experiments and computational methods.Although the traditional biological experiments achieve high detection accuracy,their detection scales are small.The newly developed high-throughput methods can detect protein-protein interactions in large scale,but the detection results often result in high false positives and high false negatives.In addition,biological experiments are usually expensive and time-consuming.With the development of big data and machine learning technologies,computational methods make up for the shortcomings of biological experiments,and more and more computational methods are used to predict protein-protein interactions.However,these computational methods still have some defects,for example,they take a lot of effort to train the models and find the best training parameters.To solve these defects,some improvements are proposed in this paper.In this paper,a computational method based on scale-invariant feature transform and weighted extreme learning machine is proposed to predict protein-protein interactions.Experimental datasets are protein sequences with position-specific scoring matrix(PSSM),which contain protein evolutionary information and can enhance prediction results.In this experiment,PSSM datasets are preprocessed firstly,then SIFT algorithm is applied to extract features,afterwards principal component analysis is used to reduce the dimension of extracted features.At last,WELM classifier is employed to predict protein-protein interactions,and a series of evaluation results are obtained.When applying the proposed method to predict protein-protein interactions on three well-known datasets of Yeast,Human and Helicobacter pylori,the average accuracies of five-fold cross validation are obtained as high as 94.83%,97.60% and 83.64%,respectively.In order to better evaluate the proposed method,the "SIFT+WELM" prediction model is compared with the popular SVM-based method in detail.In addition,some recent-developed methods are compared with the proposed method.With the increasement in amount of training data,the training time of support vector machine grows in exponential progression,so a prediction model based on parallel support vector machine is proposed to predict protein-protein interactions in this paper.Firstly,SIFT algorithm or low rank approximation method is used to extract features from datasets,then K-means algorithm is applied to cut extracted features into blocks,afterwards blocks are partitioned and rearranged into new features,at last features are trained in parallel in Hadoop distributed platform,and the experimental results are compared with the standard support vector machine in terms of time and accuracy.The protein-protein interactions' prediction model based on SIFT algorithm and WELM achieves high prediction accuracy,and it efficiently shortens the time to find best parameters.So it makes up for the shortcomings of existing prediction models.Besides,the prediction model based on parallel support vector machine greatly shortens the training time while guaranteeing prediction accuracy.
Keywords/Search Tags:Prediction of protein-protein interactions, Scale-invariant feature transform, Parallel support vector machine, Weighted extreme learning machine
PDF Full Text Request
Related items