Research On Swarm Intelligence Feature Selection Algorithm For Protein Sequence Classification

Posted on:2022-11-28

Degree:Master

Type:Thesis

Country:China

Candidate:Z W Yan

Full Text:PDF

GTID:2480306758492264

Subject:Automation Technology

Abstract/Summary:

PDF Full Text Request

As an essential substance for human growth,protein plays an important role in biological processes.Among many proteins,neuropeptide is a neurotransmitter and hormone that plays an important role in the immune system and endocrine system.Neuropeptide is a large class of messenger molecules related to neurotransmission.It is generally less than 50 amino acid residues in length.It is a small protein,but neuropeptide is indispensable in the sound operation of many biological functions,For example,development,differentiation,muscle contraction,learning,memory and adaptation.With the development of research level,more and more neuropeptides have been found.Accurate identification of neuropeptides is a significant progress and farreaching significance in the field of neuroscience.It is an indispensable research field in the basic research of immunoinformatics and drug development.However,the current methods of accurate identification and classification of neuropeptides mainly rely on biomedical experiments,which are time-consuming,costly and laborious.With the development of bioinformatics,many biomedical problems can be solved with the help of relevant Informatics,and the identification of neuropeptides is naturally in this ranks.This paper proposes an integrated learning model based on different protein feature description algorithms and different classifiers,and the optimization of algorithm parameter weight of particle swarm optimization algorithm based on swarm intelligence algorithm.In this integrated predictor,after using 9 feature extraction algorithms and 5 machine learning classification algorithms,45 models are generated as baseline learning models.In the first layer,for a large number of features after these features are extracted,first select the features of the 45 baseline models,and then in the second layer,select eight basic learning models according to the sum of the accuracy of the baseline model pair and the Pearson correlation coefficient.In the third layer,the output of these learners is input into the classifiers such as logistic regression and limit gradient lifting(XGboost),and the final step is selected to train the final model,and the output is taken as the final prediction result.Aiming at the problem of parameter sensitivity in the model,swarm intelligence algorithm is used to optimize the key parameters in each stage,so as to further improve the accuracy of classification.Finally,the accuracy on the test data set is 0.9345,which is higher than the existing model.Therefore,we hope that this predictor can provide important progress in the discovery of neuropeptides as a new drug for the treatment of nervous system diseases.The advantage of the model is that it can avoid over fitting caused by noise and redundant features in the prediction model and effectively reduce the complexity of superposition model construction.For the future related research direction,we can start from the species diversity and species diversity of neuropeptides to form a model with stronger generalization ability,analyze neuropeptides in a more diversified way from the aspect of characteristics,and there is also a space for the optimization of model parameters in the future.

Keywords/Search Tags:

Feature Selection, Neuropeptide, Machine Learning, Stacking Method

PDF Full Text Request

Related items

1	Prediction Of Harmful Algal Blooms Based On Machine Learning Technology
2	Prediction Of Neuropeptide Precursor And Its Cleavage Site Based On Machine Learning
3	Study On Risk Assessment Of Landslide Based On Machine Learning
4	Research On Identification Method Of N6-methylation Sites Based On Machine Learning
5	Database Construction And Precursor Prediction For Neuropeptide
6	Prediction Of RNA-protein Interactions Based On Machine Learning
7	Research On Modeling Of Forest Stock Volume And Model's Universal Applicability
8	Predicting Carbonylation Sites Based On Machine Learning Methods
9	Mining Probiotic Genome Molecular Markers And Constructing A Visual Screening Prediction Platform Based On Machine Learning
10	Research Of Brain-network-oriented Feature Selection Method And Its Application