Antimicrobial peptides(AMPs)are natural polypeptides with antibacterial activity,found in various organisms,and are an important part of the innate immune system.In order to solve the growing problem of conventional antibiotic resistance,AMPs have been applied in different fields as highly potential alternatives.Some AMPs demonstrated the potential to combat COVID-19,and hinder viral infectivity in diverse ways.It is of great significance that deep learning-based methods can quickly screen out candidate samples of AMPs from massive protein sequences to help discover new AMPs.In this thesis,we designed a flexible and interpretative deep learning model.Different sequence embedding encodings take into account both sequence primary information and evolutionary information.Different feature extraction modules can learn feature matrices from different aspects.Multiple protein sequence feature descriptors were collected from early literature,and efficient feature descriptors were evaluated and screened out using different machine learning models as supplementary information.The introduction of attention mechanisms fuses complementary information and feature matrix into a mixed feature that is used for the classification task of AMPs.Our model can learn both efficient sequence coding and adaptively incorporate heterogeneous features.It shows excellent learning ability on the benchmark dataset,and the model always converges quickly and stably on the training set,while showing good generalization performance on the validation set and the testing set.Compared with the latest methods,the accuracy on the independent testing set is 0.919,only second to the ACEP model.On the independent validation set,the accuracy of the ACEP model is only 0.878,while the accuracy of our model is0.931,which is much higher than the former,indicating that our model has better stability while maintaining a high accuracy,and the fitting ability and generalization ability can be proved.During the training process,the model also learned the similarity between amino acids and the attention score matrix of the sequence,which has a certain degree of interpretability and extensibility. |