Font Size: a A A

Research On Protein Sequence Classification Based On Deep Learning

Posted on:2022-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:X Y GuFull Text:PDF
GTID:2510306491466224Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Proteins perform a wide variety of functions within different eukaryotic cells.Therefore,protein sequence classification and prediction is a deeply studied problem in the field of computational biology.The importance of understanding protein function has aroused researchers' attention to improve the ability of protein classification and prediction.With the development of various computational methods,improving the prediction efficiency of protein classification has become the focus of attention.In order to solve this problem,there are two effective solutions: to find a powerful and optimal attribute set and to use a powerful predictive neural network model to work.For example,in the past,some bioinformatics researchers have used machine learning techniques with strong feature sets,such as pseudo amino acid composition,location specific score matrix(PSSM)and biochemical characteristics(AAINDEX),PSEAAC,etc.This paper mainly focuses on the advantages of high accuracy,low cost and short research cycle by using strong feature set and deep neural network.A new protein classification prediction method is proposed to further improve the performance.1.A classification prediction method based on random forest classifier model is proposed.When predicting the classification accuracy of G-protein-coupled receptor(GPCRs)sequences,CTDC was used to extract sample features,MRMD2.0 dimension reduction method was used to eliminate redundant features,and random forest(RF)was used as the classifier to establish the classification model.Models for different feature extraction methods,different dimension reduction method,different classifiers and different characteristics on the predictions of a sequence are made a contrast experiment.Finally,on the classification of GPCRs and nonGPCRs protein sequence prediction,compared with previous experimental results,the article concluded that the visual binary classification diagram the positive and negative samples have obvious dividing line,However,there is no obvious dividing line between positive and negative samples in their dichotomies,which indicates that this model has achieved a certain breakthrough in the prediction of sample classification.2.A classification prediction method based on deep neural network model is proposed.In this study,a deep neural network model was used to extract the sequences of non-vesicle transporter proteins using 188D-specific protein extraction method.Then,a dimensionality reduction method(MRMD)was used to filter the 188 dimensional protein features into 39 dimensional ones and eliminate the redundant features.SMOTE method was used to balance positive and negative data sets and divide training test sets.Finally,the training set was put into the established neural network model for machine learning,and the parameters of the backpropagation optimization were optimized to obtain a more optimized model,and the prediction accuracy was greatly improved.As can be seen from the effect chart of classification prediction in Table 4-2,the existing model achieved a recall rate of nearly 86%and an accuracy rate of 71%.
Keywords/Search Tags:Protein classification and prediction, Deep learning, Neural network, Strong feature set
PDF Full Text Request
Related items