Functional proteins refer to the proteins that carry various physiological functions of organisms and can complete various metabolic activities of human body.In this thesis,we combine deep learning algorithms and traditional machine learning algorithms together to predict several types of functional proteins,including antimicrobial peptides(AMPs)and interleukins(ILs)inducing peptides.The main research contents are as follows:(1)Antimicrobial peptide(AMP)is an important part of biological innate immune system,which is characterized by high stability,no pollution,antibacterial activity against a variety of bacteria,fungi and viruses,and relatively low probability of drug resistance,and is considered to be the best substitute for antibiotics.In this thesis,an integrated framework,En AMP,is constructed for prediction of AMP.The integrated architecture is mainly composed of three parts: the first part is random forest(RF),the second part is support vector machine(SVM),and the third part is deep learning(DL)model including convolutional neural network(CNN)and Bi-directional Long Short-Term Memory(Bi-LSTM).Among them,9 statistical features of protein sequences are input into RF and SVM,and word vectors pre-trained by two word embedding technologies(Word2vec and Glove)are input into DL framework.The above models are trained separately,and the final model results are obtained through average integration.Experimental results on six data sets show that En AMP improves significantly compared with existing models.Even compared with Bi-directional Encoder Representation from Transformers(BERT)model with very high computational complexity,En AMP’s performance is not bad,and its computational complexity is much lower.(2)Interleukins(ILs)are a group of multifunctional cytokines,which play important roles in immune regulations and inflammatory responses.Recently,IL-6 has been found to affect the development of COVID-19,and significantly elevated levels of IL-6 cytokines have been reported in patients with severe COVID-19.IL-10 and IL-17 are anti-inflammatory and pro-inflammatory cytokines that prevent inflammatory responses and play multiple protective roles in host defense against pathogens,respectively.At present,a number of machine learning methods have been proposed to predict ILs inducing peptides,but their predictive performance needs to be further improved,and the inducing peptides of different interleukins are predicted separately,rather than using a general approach.In our work,we combine the statistical features of peptide sequence with word embedding to design a general ensemble model named En ILs to predict inducing peptides of different interleukins,in which the predictive probabilities of random forest,e Xtreme Gradient Boosting(XGB)and neural network are integrated in an average way.Compared with state-of-the-art machine learning methods,En ILs shows considerable performance in the prediction of IL-6,IL-10 and IL-17 inducing peptides.In addition,we predict the most promising IL-6 inducing peptides in SARS-Co V-2 spike protein in the case study for further experimental verification.Studies have shown that serum from some patients is highly responsive to the spike protein domain KYEQYIKWPWYIWLG. |