Protein Fold Recognition Based On Amino Acid Sequence

Posted on:2023-09-04

Degree:Master

Type:Thesis

Country:China

Candidate:S Y Dong

Full Text:PDF

GTID:2530307070973549

Subject:Statistics

Abstract/Summary:

PDF Full Text Request

Protein fold recognition is an important topic of "Biophysics in the 21 st Century".The research results of protein fold can provide support for the early warning of genetic diseases and the design of protein-targeted drugs.Based on optimization theory,Machine Learning method and Deep Learning method,with protein data sets DD and RDD,protein fold recognition is studied in this thesis.The main contents are as follows:Firstly,in view of the incomplete expression of amino acid sequence information at present,an optimization scoring model that can be used to determine the best subset is proposed.In this paper,four methods of Pseudo Amino Acid Composition(pse AAC),Pseudo Position Specific Scoring Matrix(pse PSSM),Encoding Based on Grouped Weight(EBGW)and Detrended Cross-Correlation Analysis(DCCA)are used to extract protein sequence features,and the features under different parameter values are scored to evaluate the information content,so as to determine the optimal parameter value.After four features are extracted,they are combined,and the best feature subset is selected to represent amino acid sequence information.Secondly,for the 27-class multi-classification problem,the pso-mc ODM model combining the Particle Swarm Optimization algorithm and the Multi-Classification Optimal i Interval Distribution Learning Machine for predicting protein fold is proposed.Based on SVM,the mc ODM algorithm maximizes the interval mean while minimizing the interval variance,and uses the Random Mirror Proximal Descent method to solve the non-convex and non-smooth optimization problem,which can improve the multi-classification performance more efficiently.Thirdly,in view of the low training efficiency of traditional machine learning models for complex multi-classification problems and the strong dependence of prediction performance on feature engineering,the DNN＿fold prediction model is proposed.DNN＿fold is a ten-layer deep network recognition framework based on the Keras framework.The input layer inputs protein features and labels;the hidden layer is composed of a fully connected layer with a decreasing number of neurons and a random discarding layer,and iteratively learns the input features;the output layer outputs 27 class fold score.Finally,the experimental results show that pso-mc ODM has good performance.The DNN＿fold framework obtains more protein sequence information than traditional machine learning methods in the process of layer-by-layer iterative learning,which significantly improves the folding recognition rate and accuracy.

Keywords/Search Tags:

Protein Fold Recognition, PSO, mcODM, Deep Learning

PDF Full Text Request

Related items

1	Research On Protein Fold Recognition Based On Multi-view Learning Algorithm
2	Protein Fold Recognition Based On Fold-specific Features
3	Extraction Of Shortest Representation Of Protein Folds Based On Convolutional Neural Network
4	A Novel Ensemble Classifier And Its Application In Protein Fold Recognition
5	Protein Remote Homology Detection And Folding Recognition Based On SCOP Topology
6	Research On Protein Subcellular Location Method Based On Deep Learning
7	Recombination Hotspots And Protein Fold Recognition Based On Sequence Information
8	Deep Learning Methods For Space-Based Electromagnetic Signal Recognition
9	Researches On Transmembrane Protein Fold Recognition
10	Application Of Deep Learning In Facial Recognition And Motor Posture Analysis Of Non-Human Primates