Font Size: a A A

Identification Of Human Membrane Protein Types By Incorporating Network Methods

Posted on:2022-01-22Degree:MasterType:Thesis
Country:ChinaCandidate:X L ZhangFull Text:PDF
GTID:2480306728986499Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As one of the three major classes of proteins,membrane proteins are important proteins.About 30% of the genome sequence can encode membrane proteins.About 60% of proteins are the targets of drugs.Membrane proteins play their roles in cell biology to ensure the stability of organisms.In addition,it has been shown that the type of membrane protein is highly related to its function.Therefore,identifying the type of membrane protein can help infer its function.However,using traditional biophysical methods to identify the types of membrane proteins are difficult to meet actual needs.Thus,we can adopt modern information technology(e.g.machine learning)to design novel methods that can quickly and reliably identify membrane protein types.Actually,the model is mainly established by considering the following two basic problems.One is the effective expression of membrane proteins,the other is to determine the appropriate classification algorithm.Based on networks,this study employed quite different feature extraction schemes,which can include novel information of membrane proteins.Several models are built by incorporating some classification algorithms.The classification results indicate that the proposed models have good recognition performance for predicting the types of membrane proteins.The main research contents are as follows.(1)A novel network embedding based method is proposed for the prediction of membrane protein types.First,eight protein networks are constructed using the protein-protein interaction information extracted from STRING.Then,Mashup and Node2 Vec are used to process the first seven protein networks and the last comprehensive network respectively.Obtained features are used to construct the single network model and multiple network model,respectively.The Synthetic Minority Over-sampling Technique(SMOTE)is used to reduce the influence of imbalanced sizes of different membrane protein types.Furthermore,these two network models are integrated into one model by grid search.The classification results on the training and independent test sets prove that the integrated model is superior to any single model and previous models,proving that it is competitive.(2)A new method is proposed for identifying membrane protein types,which fuses the information of protein networks and sequences.First,after obtaining the protein sequence and network information,the novel feature fusion method is proposed to fuse this information into a vector.The linear combination method is used to reconstruct the protein sequence information.The combination coefficient is defined as the probability that is produced by random walk with restart using the given protein as seed node.Then,seven different classification algorithms are used to learn two different fusion feature vectors,thereby constructing the fusion models.Finally,the proposed model is compared with the models only using sequence information or network information.Results show that its performance in predicting membrane protein types is improved.
Keywords/Search Tags:membrane protein, network embedding method, protein-protein interaction network, integrated model, random walk with restart algorithm
PDF Full Text Request
Related items