Font Size: a A A

Research On Precise Prediction Of Protein Post-translational Modification Sites

Posted on:2024-03-16Degree:MasterType:Thesis
Country:ChinaCandidate:T MengFull Text:PDF
GTID:2530306938451484Subject:Computer science and technology
Abstract/Summary:PDF Full Text Request
Post-Translational Modifications are a crucial step in the synthesis of mature proteins in organelles.Post-Translation Modification of protein will significantly change the physical and chemical properties of protein,transform the conformation of protein,and directly alter the binding ability and function of protein.From this,it can be seen that even if the expression level of the protein remains unchanged,if the state of post-translational modification changes,the function of the protein will also significantly change,and even some protein functions are related to many diseases.Therefore,the prediction of protein post-translational modification sites has become an important topic.In previous studies,many researchers have accumulated a large amount of protein PTM data through high-throughput biological experimental methods,and developed PTM prediction tools that are more efficient,fast,and low-cost than traditional biological experimental methods based on various machine learning methods(ML).With the advent of the big data era,the computing power of computers has significantly improved.Researchers have used some deep learning methods to predict protein PTMs and achieved effective results.However,there is a significant issue of sample imbalance in the prediction process of protein PTM sites,and the accuracy of PTM site prediction also needs to be improved.In this thesis,prediction studies of lysine malonylation sites as well as different modification types on succinylation sites were performed by machine learning and deep learning methods.A new neural network model that called Caps Net is used to predict and improve its prediction accuracy.The main research content and work are as follows:(1)A Mal-PCASVM model is proposed to predict lysine malonylation sites using Principal Component Analysis(PCA)combined with Support Vector Machine(SVM).In addition,four classifier models,namely random forest(RF),k-Nearest Neighbor algorithms(k NN),Naive Bayes(NB),and Ensemble of decision tree(Ensemble),were constructed to compare with the Mal-PCASVM model.Using CKSAAP encoding,One-hot encoding,and AAindex encoding methods to extract features from PTM site sequences,the encoded features are used as inputs for different models and experiments are conducted.The experimental results indicate that the model using principal component analysis combined with support vector machine performs better.(2)The PTM prediction model based on the convolutional neural network encoded by EAAC has been proposed.With regard to the convolutional neural network,our input is an amino acid sequence.EAAC encoding is used to extract features and construct a convolutional neural network model to predict the PTM sites of lysine malonylation.In addition to predicting lysine malonylation sites,we have also constructed one-dimensional and two-dimensional convolutional neural network models to predict succinylation sites in small datasets.Through comparing with traditional machine learning models,it can be concluded that CNN models perform better in predicting performance on different data sets.(3)The PTM prediction model based on EAAC encoded full convolutional neural networks and capsule networks is proposed.Full convolutional networks(FCN)are the development and extension of convolutional neural networks,using convolutional layers instead of traditional full connected layers.For input lysine sequences,the convolutional layer represents local feature information,while the full connected layer represents global feature information,Therefore,we use convolutional layer instead of the fully connected layer in CNN to provide location information for the output features,which is helpful for site prediction of sequence problems,and improves prediction efficiency.This thesis constructs a FCN model to predict PTM,and conducts comparative experiments with traditional machine learning models and CNN.The results indicate that the prediction performance of the FCN model is superior to traditional machine learning models.Capsule Network(Caps Net)is a newly proposed model that can improve the shortcomings of information space loss in convolutional neural networks when extracting features.In this thesis,after EAAC encoding of lysine sequences,after a series of convolutional operations,we obtain our classification results through dynamic routing algorithms.Comparative experiments with traditional machine learning models and deep learning models show that the prediction accuracy obtained using the capsule network classification model is higher.(4)The protein PTM site prediction system has been built.Users can select the data set of the system,load training data sets and test data sets,or use their own data sets.Then,researchers can select feature extraction methods and classification models based on their own needs,and finally select a verification method to start running.After running,the encoded sequence will be automatically saved as a csv file,which can be saved by themselves into other classifiers as input,and the results will be displayed in a table or image format.
Keywords/Search Tags:Post-Translation Modification of protein, Deep learning, Fully convolutional neural network, Capsule network, PTM prediction system
PDF Full Text Request
Related items