Font Size: a A A

Research On Essential Protein Identification Algorithm

Posted on:2021-05-06Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhaoFull Text:PDF
GTID:2370330629452705Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The essential protein is very important for the survival of the organism,and its absence will lead to the organism's disease and unable to survive.The identification of essential proteins is conductive to the research and exploration of cell function and biological mechanism.In recent years,a large number of algorithms based on protein-protein interaction network(PPI network)to identify essential proteins have been proposed,the traditional PPI network is constructed by the protein interaction data(PPI data)measured by biological experiments,but there are many false positive in these PPI data,how to filter the false positive data effectively needs further study.In PPI network,the importance of a protein doesn't just depend on the characteristics of its local neighbors.We should look at the whole PPI network to describe each protein in it.To solve the above problems,in this paper,two identification of essential proteins algorithms are proposed,research contents and innovation points are as follows:(1)The identification of essential proteins algorithm TSW based on the same time and space active protein-protein interaction network is proposedBased on the fusion of biological data to construct the same time and space active PPI network,TSW further measures the affinity of two proteins according to the topological properties of PPI network based on the improved edge aggregation coefficient,and assigns the weight of the protein interaction edge.TSW builds a more reliable PPI network through the inherent biological properties of PPI network,and integrates the topological properties of PPI network to build the final weighted PPI network.Experimental results show that compared with other traditional algorithms,TSW has higher accuracy in identifying key proteins.The innovation of TSW lies in the following points: firstly,based on gene expression data and subcellular location data to construct a network of protein interactions active in the same time and space,which can effectively filter the false positive data while describing the biological properties of the protein itself.Secondly,we propose an improved ECC,which is WNECC.WNECC not only describes the central topological properties of the first-order public neighbor,but also describes the central topological properties of the second-order public neighbor.(2)The identification of essential proteins algorithm RWLR based on LeaderRank with restart mechanism in weighted protein interaction network is proposedRWLR first constructs a weighted PPI matrix by GO annotation and protein complex,and then sets the initial score for each protein in the PPI network according to TSW algorithm.Finally,iterates in the matrix of weighted PPI and the initial score of protein based on LeaderRank with restart mechanism.When the scores of all proteins in the PPI network converge,the algorithmends and the final result of the protein score vector is the final score of all proteins.RWLR provides a global description of each protein in the entire PPI network,and solves the limitations of TSW algorithm to a certain extent.Compared with TSW,RWLR can identify essential proteins more accurately.The innovation of RWLR lies in the following points: firstly,improve leaderank and add restart mechanism,so that the algorithm can return to the initial score vector set for each protein in PPI network according to TSW algorithm with a certain probability.Secondly,the improved leaderank algorithm is applied to the identification of essential proteins.A weighted PPI matrix is constructed based on two kinds of biological data,i.e.GO annotation and protein complex.It iterates on the weighted PPI network matrix and the initial protein score vector,making RWLR provide a global description of each protein relative to the whole PPI network.Compared with the traditional PPI network topology based essential proteins identification algorithm,although the existing essential proteins identification algorithm integrats many biological properties,there is still room for improvement.At present,the quantitative description of biological attributes can not express biological information very well.Therefore,in the future research,we should combine deep learning with automatic learning of biological characteristics to make up for the shortcomings of existing algorithms.
Keywords/Search Tags:Essential Protein, PPI Network, Gene Expression Profile, Subcellular Location, GO Annotation, Protein Complex, LeaderRank With Restart Mechanism
PDF Full Text Request
Related items