Font Size: a A A

Prediction Of Protein Structure Classes Based On Multi-kernel SVM

Posted on:2021-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:X Y WenFull Text:PDF
GTID:2370330602489833Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Protein is the most important material basis of all living organism,and specific spatial structure determines its biological function.Researching the protein structural classes is helpful to understanding of higher-level protein structural features,which provides important clues for the functional study of proteins and Pharmaceutical Research.Conventional biological experiment methods are costly and cannot meet the needs of large-scale analysis of protein structure classes.Using vector representation of protein sequence information and machine learning algorithms can effectively predict protein structure classes.In order to improve the accuracy of protein structure classes prediction,this thesis studies the sequence feature extraction of proteins and kernel functions of support vector machines.Firstly,different types of sequence feature extraction methods are used to perform multi-class feature fusion for protein sequences,and then a multi-kernel support vector machine method is used to identify and predict protein structure classes.The research contents are as follows:(1)We proposed a protein structure prediction model for multi-information fusion.Several main methods of protein feature extraction was studied,including PseAAC,OTC,DPC,and ACS,as well as principal component analysis and other types of feature selection methods.Four methods are used to characterize the amino acid sequence.Then,low-variance filtering and principal component analysis methods were used to reduce the dimensions of vector features and enhance the representativeness of information.The performance of the combination of feature sets on the different classifiers is compared.When the input is the reduced feature set PseAAC-DPC-OTC,the results show that the model accuracy is the highest,and it is more advantageous to use the benchmark support vector machine algorithm to classify the data after dimensionality reduction.The results of cross-validation experiments show that the method of protein structure classes prediction based on feature fusion obtain effective prediction results on the benchmark data set.(2)We studied the problem of protein structural classes prediction with multi-information fusion weighted multi-kernel support vector machine.First,PseAAC,OTC and DPC are used to extract features of the protein structure data set.Second,it determines the number of candidate basic kernel function categories and their internal parameters,and chooses a combination of kernel functions under a variety of different parameters for comparative study.Using the five-fold crossover method to test,this thesis compared the accuracy of the classification model under different parameters.Finally,by comparing the accuracy,precision,recall,F1 and AUC values,the best kernel parameters and the combined feature set most suitable for multi-kernel SVM model are determined.Experimental results show that the PseAAC-DPC feature set combined with the multi-kernel SVM model can effectively improve the prediction performance of protein structure classes.Compareing with other SVM extended models,the classification model based on multi-kernel SVM and feature fusion proposed has obvious advantages in prediction of protein structure classes,and the overall accuracy rate reaches 89.13%.
Keywords/Search Tags:protein structure classes, SVM, multi-kernel learning, feature extraction
PDF Full Text Request
Related items