| Protein fold recognition is an important topic of "Biophysics in the 21 st Century".The research results of protein fold can provide support for the early warning of genetic diseases and the design of protein-targeted drugs.Based on optimization theory,Machine Learning method and Deep Learning method,with protein data sets DD and RDD,protein fold recognition is studied in this thesis.The main contents are as follows:Firstly,in view of the incomplete expression of amino acid sequence information at present,an optimization scoring model that can be used to determine the best subset is proposed.In this paper,four methods of Pseudo Amino Acid Composition(pse AAC),Pseudo Position Specific Scoring Matrix(pse PSSM),Encoding Based on Grouped Weight(EBGW)and Detrended Cross-Correlation Analysis(DCCA)are used to extract protein sequence features,and the features under different parameter values are scored to evaluate the information content,so as to determine the optimal parameter value.After four features are extracted,they are combined,and the best feature subset is selected to represent amino acid sequence information.Secondly,for the 27-class multi-classification problem,the pso-mc ODM model combining the Particle Swarm Optimization algorithm and the Multi-Classification Optimal i Interval Distribution Learning Machine for predicting protein fold is proposed.Based on SVM,the mc ODM algorithm maximizes the interval mean while minimizing the interval variance,and uses the Random Mirror Proximal Descent method to solve the non-convex and non-smooth optimization problem,which can improve the multi-classification performance more efficiently.Thirdly,in view of the low training efficiency of traditional machine learning models for complex multi-classification problems and the strong dependence of prediction performance on feature engineering,the DNN_fold prediction model is proposed.DNN_fold is a ten-layer deep network recognition framework based on the Keras framework.The input layer inputs protein features and labels;the hidden layer is composed of a fully connected layer with a decreasing number of neurons and a random discarding layer,and iteratively learns the input features;the output layer outputs 27 class fold score.Finally,the experimental results show that pso-mc ODM has good performance.The DNN_fold framework obtains more protein sequence information than traditional machine learning methods in the process of layer-by-layer iterative learning,which significantly improves the folding recognition rate and accuracy. |