Font Size: a A A

Prediction Of Membrane Protein Amphiphilic Helix Based On Horizontal Visibility Graph And Graph Convolution Network

Posted on:2024-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:B L JiaFull Text:PDF
GTID:2530306935499584Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
According to statistics,20 to 30% of genes in most organisms encode membrane proteins.Membrane proteins are usually proteins that span the cell membrane and are attached to the cell membrane.An amphipathic helix is a segment of sequence on a membrane protein that is approximately parallel to the membrane.The amphipathic helix is a common secondary structural motif in various proteins and peptides and is defined as an alpha helix with opposite polar and non-polar faces along the long axis of the helix.The amphipathic helix differs from other structures within membrane proteins in that one side is located outside the membrane while the other side is located inside the membrane.Due to the complexity of the amphipathic helix structure,the number of amphipathic helices with known structures is much smaller than their total number.Nevertheless,amphipathic helices play an important role in cellular life activities such as induction of membrane tubule formation,recognition of membrane curvature,inhibition of viral fusion,and protection of cell membranes or lipid droplets.Biologists have mainly use traditional biological experiments such as surface NMR,Cryo-EM and X-ray,which are not only slow and costly,but also labor-intensive.The computational approach to predict the amphipathic helix structure of membrane proteins is less time consuming and less expensive,which can effectively solve the problems of traditional experiments.In this thesis,we predict the amphiphilic helix structure of membrane proteins based on horizontal visibility graph and graph convolution network.The main contents are as follows:(1)This thesis proposes the prediction method for membrane protein amphiphilic helix based on TextCNN.A new dataset of amphipathic helices is constructed.Firstly all alpha transmembrane helix proteins are downloaded from the PDBTM database.Then according to the most stringent redundancy criteria for homologous protein sequences,the CD-HIT toolkit is used to remove redundancy from protein sequences and 30% homologous protein sequences were retained.Finally the amphiphilic helix structure are determined according to the relevant literature and three-dimensional structure of membrane proteins in the PDB database,and the dataset constructed in this thesis is obtained.For each membrane protein sequence in the dataset,the hidden Markov model profile,secondary structure and hydrophobicity scale are extracted.Sliding window and TextCNN are used to extract the local and global features of the protein respectively.The concatenation of the two types of features is used as a feature matrix into the fully connected network for classification and a better classification of membrane protein amphiphilic helices is achieved.(2)The prediction method for membrane protein amphiphilic helix based on HVG and TextCNN is prestened.In this thesis,the feature extraction method of membrane protein map based on HVG is studied.The predicted secondary structure features of protein are mapped to time series using CGR,and the time series are constructed into a corresponding complex network by the HVG algorithm.There are hundreds of amino acids in the protein sequence,and each amino acid is represented as a node.In the complex network,the relationship between nodes is used to represent the interaction between amino acids.Therefore,the adjacency matrix extracted from the complex network in this thesis can be used as the graph feature of the protein.The processing of node features of proteins is the same as above.Node features and graph features are sent into the GCN,and the GCN is used to aggregate node features and graph features in the iterative process to realize the classification of amphiphilic helix of membrane proteins.(3)The prediction method for membrane protein amphiphilic helix based on BiLSTM and GCN is proposed.This thesis presents a new feature of membrane protein.The hydrophobicity scale can reveal the physical properties of amphiphilic helix of membrane proteins and has great potential to improve the prediction accuracy.In this thesis,the hydrophobicity scale of a certain length of amino acids are statistically analyzed.By sliding the hydrophobicity scale of the central amino acid and the left and right amino acids,and taking the mean and standard deviation of them,a new hydrophobic statistical characteristic is obtained.This feature is fused with the HMM profile,secondary structure and hydrophobicity scale as node features.Sliding window and BiLSTM are used to obtain the local contextual and long-distance dependence information.Combined with the graph features extracted from the HVG algorithm,the GCN is used to classify and verify the dataset.The experimental results show that compared with the existing methods,the method proposed in this thesis effectively improves the prediction accuracy of membrane protein amphiphilic helix.
Keywords/Search Tags:membrane protein, amphipathic helix, complex network, horizontal visibility algorithm, graph convolutional network
PDF Full Text Request
Related items