Font Size: a A A

Research On Swarm Intelligence Feature Selection Algorithm For Protein Sequence Classification

Posted on:2022-11-28Degree:MasterType:Thesis
Country:ChinaCandidate:Z W YanFull Text:PDF
GTID:2480306758492264Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
As an essential substance for human growth,protein plays an important role in biological processes.Among many proteins,neuropeptide is a neurotransmitter and hormone that plays an important role in the immune system and endocrine system.Neuropeptide is a large class of messenger molecules related to neurotransmission.It is generally less than 50 amino acid residues in length.It is a small protein,but neuropeptide is indispensable in the sound operation of many biological functions,For example,development,differentiation,muscle contraction,learning,memory and adaptation.With the development of research level,more and more neuropeptides have been found.Accurate identification of neuropeptides is a significant progress and farreaching significance in the field of neuroscience.It is an indispensable research field in the basic research of immunoinformatics and drug development.However,the current methods of accurate identification and classification of neuropeptides mainly rely on biomedical experiments,which are time-consuming,costly and laborious.With the development of bioinformatics,many biomedical problems can be solved with the help of relevant Informatics,and the identification of neuropeptides is naturally in this ranks.This paper proposes an integrated learning model based on different protein feature description algorithms and different classifiers,and the optimization of algorithm parameter weight of particle swarm optimization algorithm based on swarm intelligence algorithm.In this integrated predictor,after using 9 feature extraction algorithms and 5 machine learning classification algorithms,45 models are generated as baseline learning models.In the first layer,for a large number of features after these features are extracted,first select the features of the 45 baseline models,and then in the second layer,select eight basic learning models according to the sum of the accuracy of the baseline model pair and the Pearson correlation coefficient.In the third layer,the output of these learners is input into the classifiers such as logistic regression and limit gradient lifting(XGboost),and the final step is selected to train the final model,and the output is taken as the final prediction result.Aiming at the problem of parameter sensitivity in the model,swarm intelligence algorithm is used to optimize the key parameters in each stage,so as to further improve the accuracy of classification.Finally,the accuracy on the test data set is 0.9345,which is higher than the existing model.Therefore,we hope that this predictor can provide important progress in the discovery of neuropeptides as a new drug for the treatment of nervous system diseases.The advantage of the model is that it can avoid over fitting caused by noise and redundant features in the prediction model and effectively reduce the complexity of superposition model construction.For the future related research direction,we can start from the species diversity and species diversity of neuropeptides to form a model with stronger generalization ability,analyze neuropeptides in a more diversified way from the aspect of characteristics,and there is also a space for the optimization of model parameters in the future.
Keywords/Search Tags:Feature Selection, Neuropeptide, Machine Learning, Stacking Method
PDF Full Text Request
Related items