Font Size: a A A

An Essential Protein Identification Method Based On PPI Networks Data And Gene Expression

Posted on:2021-01-14Degree:MasterType:Thesis
Country:ChinaCandidate:C TangFull Text:PDF
GTID:2370330611459894Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
With the development of the Human Genome Project,sequencing data of more and more species has been dug out by biologists,and life science research has gradually focused on genomics since then.However,the development of genomics is only one of the foundations of the study of the essential characteristics of life.Cell metabolism,signal transduction and gene regulation during life are all achieved through proteins.Protein is the basic organic substance that constructs cells,and it is the guarantee of the material conditions and biological functions of life activities.Biological proteins are mainly divided into two categories:essential proteins and non-essential proteins.Existing essential proteins refer to proteins that,after removing or destroying proteins in living organisms,cause the loss of related biological functions and cause the organisms to fail to survive.Essential proteins are essential for the physiological activity of cells and the survival of organisms.Therefore,when studying cell growth and regulation,how to accurately identify essential proteins has become a crucial step.At present,there are a series of calculation methods based on network topology for the prediction of essential proteins,such as degree-centered method?DC?,information centrality?IC?,information vector centrality?EC?,subgraph centrality?SC?,Intermediate centrality?BC?,proximity centrality?CC?,essential protein measurement method?NC?based on edge aggregation coefficient,etc.However,with the continuous development of high-throughput instruments and science and technology,more and more experimental data have been obtained.Among them,essential protein prediction algorithms based on gene expression data and PPI networks are often used,such as essential protein measurement methods?Pe C?based on gene expression data and PPI network data and essential protein measurement methods?P&E?based on weighted measurement centrality.However,the volatility of gene expression data greatly affects the accuracy of essential protein identification.For the above problems,on the basis of Protein-Protein Interaction?PPI?network,we analyze the appearance of noise in gene expression data to improve the accuracy of predicting essential proteins:?1?Based on the protein interaction network and gene expression data,this study proposes essential protein recognition algorithm JDC based on protein clustering characteristics and gene"activity"expression.Existing essential protein prediction methods use a large amount of feature data,which is likely to increase the calculation cost.Therefore,this study uses commonly used PPI networks and gene expression data to detect essential proteins.Based on the premise that proteins tend to be clustered,the effect of noise in gene expression data is eliminated by gene expression having"active"and"inactive"at different times.From the perspective of graph theory,by constructing edge aggregation The weighted edges of the coefficients ECC and Jaccard coefficients are used to find essential protein recognition methods with high recognition rate and good specificity.?2?This study analyzes that Jaccard's fluctuation coefficient has better performance in predicting essential proteins.Therefore,the Jaccard coefficient is extracted and studied from the perspective of nodes.At each protein node,the Jaccard fluctuation coefficient of the protein node is fused based on the existing prediction algorithm,and a new prediction algorithm is proposed4-).In order to verify the effectiveness of the algorithm,this study evaluated the algorithm and several essential protein prediction algorithms.The results show that the algorithm can better identify essential proteins based on the fusion of the Jaccard fluctuation coefficient.
Keywords/Search Tags:essential proteins, protein-protein interaction network, Jaccard similarity, edge clustering coefficient
PDF Full Text Request
Related items