Font Size: a A A

Prediction Of Protein Subcellular Localization Based On ResC-LSTM

Posted on:2021-03-02Degree:MasterType:Thesis
Country:ChinaCandidate:H F WuFull Text:PDF
GTID:2370330602982522Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Protein is an important part of the body's tissues and organs,and carries various functions in various compartments of eukaryotic cells.The function of a protein depends on the compartment or organelle it is in,because it provides a physiological environment for its function.Subcellular localization is the main factor determining protein-functional annotation,making complex drug design possible.However,the subcellular localization of abnormal proteins can affect the functions shown by proteins and may contribute to the pathogenesis of many human diseases;such as metabolism,heart Vascular and neurodegenerative diseases,as well as cancer.Therefore,predicting the subcellular localization of proteins as an important research content has become one of the hot topics in bioinformatics.This paper uses a deep learning network framework to study subcellular localization.The specific research work is as follows:(1)In terms of data input,considering the importance of the N-terminal sequence of the protein for subcellular localization studies,we added the statistical characteristics of residues,the annotation characteristics of Go terms,and the nearest neighbor function on the basis of retaining the N-terminal characteristics.Domain features.These features include PSSM matrix,GO term coefficients,and pseudo amino acid composition,which are good expressions of protein sequence related information.Finally,the related features are integrated into a one-dimensional feature vector as input.(2)This paper builds a ResC-LSTM deep learning network framework for protein subcellular localization.The ResC-LSTM deep learning framework is a combination of Resnet,multi-scale convolutional CNN,and bidirectional LSTM.The framework accepts one-dimensional features as input,and multi-scale convolution is responsible for extracting more information on one-dimensional features;subsequently,Resnet's residual mapping and identity mapping are used to effectively process sequence features;and then bidirectional LSTM is used to fully process the data features Thereby improving the prediction accuracy.For the optimization of model parameters,this paper uses a cross-entropy loss function to reduce the impact of discrete data,and uses a stochastic gradient descent algorithm to tune the hyperparameters of the model.(3)In order to verify the effectiveness of the ResC-LSTM network framework,this article tests on two standard datasets(DeepLoc dataset and Hoglund dataset)respectively,cross-validates the two standard datasets and selects the optimal dataset as our test Set and training set.After several experiments,the overall accuracy of the subcellular localization of ten sites reached 85.3%.At the same time,we use the framework of this paper to test on the fungal,animal,and plant datasets and compare with their methods.The results show that the ResC-LSTM framework also performs well on other datasets.Finally,we summarize the research work on protein subcellular localization and look forward to future work.
Keywords/Search Tags:Subcellular localization, Resnet, Multiscale convolution CNN, Bidirectional LSTM
PDF Full Text Request
Related items