Font Size: a A A

Cell-penetrating Peptide Prediction Algorithm Based On Deep Learning

Posted on:2022-12-31Degree:MasterType:Thesis
Country:ChinaCandidate:J Y NiuFull Text:PDF
GTID:2480306758991849Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
Cell-penetrating peptide is a kind of special polypeptide sequence,which can carry various substances to penetrate the cell membrane,and will hardly cause damage to the cell membrane in the process.This unique property enables it to be used as a carrier to transport various drugs or other substances to target cells,which has high research value in medicine field and biology field.It is widely used in many directions such as tumor treatment and gene therapy.But using traditional molecular biology methods to identify cell-penetrating peptides is time-consuming and expensive.Therefore,how to identify cell-penetrating peptides from protein sequences by computational methods is an important and valuable problem.The current mainstream classification algorithms have two categories.The first kind uses different feature representation methods to obtain the feature vectors of the sequence,and then uses the machine learning classification algorithm to classify the extracted feature vectors;The second kind uses deep learning network to extract the features of protein sequences and classify protein sequences.In this paper,a new recognition algorithm ConvCPP is proposed.We use convolution neural network to represent protein sequences as vectors,and combined with other feature extraction algorithms to obtain the final feature representation.Finally,the model uses five different machine learning classification algorithms to classify the sequence.The paper use CPP924 as benchmark dataset,which contains 462 cell-penetrating peptide sequences and 462 non cell-penetrating peptide sequences respectively.The high similarity between sequences in the dataset will affect the model prediction,and CPP924 has been screened and sorted by the dataset author to ensure that the similarity between sequences will not exceed 80%,which is also one of the reasons for selecting this dataset.Before using convolutional neural network to represent the characteristics,we have to convert the protein sequences in text format into vector format,which can be used as the input of the network.The method adopted in this paper is to represent each amino acid as a one-dimensional vector,and the feature vector of the protein is made up of amino acid vectors.In the process of feature extraction of protein sequences using convolutional neural network,this paper improves and optimizes the network structure,including adding an attention layer before inputting the feature vector into the convolution layer,which makes extracting sequence features faster.The pooling mode of the pooling layer is adjusted according to the experimental results,The max pooling is changed to dynamic k-max pooling,which enables the model to better identify cell penetrating peptides.The model also combines a variety of other conventional protein feature representation algorithms,including amino acid composition method,overlapping attribute representation method and 20-bit representation method.The combined feature vector adopts the maximum correlation minimum redundancy algorithm and t-test algorithm for feature selection,which reduces the information redundancy brought by feature fusion and further improves the performance of the model when identifying sample type.Naive Bayesian algorithm,k-nearest neighbor algorithm,support vector machine,random forest and extreme gradient boosting algorithm are used to classify sequences,and the classification results of these classifiers are integrated by voting.In order to verify the performance of the model proposed in this paper,several comparative experiments are designed,including the Ablation Experiment of convolution network,the comparative experiment of different feature representation methods,the comparative experiment of different classifiers,etc.to make sure the improvement of the convolution neural network and the recognition ability of the model to cell penetrating peptides.The experimental results on the CPP924 data set show that compared with other prediction models,ConvCPP improves the accuracy ACC by 2.2%and Matthews correlation coefficient MCC by 0.043.The experimental results show that this method has better prediction performance than the current cell membrane penetrating peptide recognition method.
Keywords/Search Tags:Deep Learning, Convolutional Neural Network, Protein sequence classification, Cell-penetrating peptide
PDF Full Text Request
Related items