Font Size: a A A

Study On Essential Protein Recognition Algorithm By Using Biological Information

Posted on:2022-09-13Degree:MasterType:Thesis
Country:ChinaCandidate:J J WeiFull Text:PDF
GTID:2480306515466934Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Protein is one of the main components of human life.The structure and character of or-ganisms are related to proteins.Almost most proteins cannot perform their functions alone,but by interacting with other proteins.Protein-Protein interaction systems form the basis of all life activities.The protein interaction network can be obtained through proteins and connections.A protein is denoted as a node of a graph and an two interacting proteins is depicted as an edge of a graph.According to the importance of roles,the protein can be divided into two categories:essential protein and non-essential protein for life activities.Essential protein is defined as a protein which would result in the inability of the organism to survive when it is removed by a knockout mutation.In complex network theory,essential proteins can be regarded as important nodes in protein networks.It is critical to understand of the survival and function of organisms by essential proteins.Therefore,the prediction and recognition of essential proteins has become an important research work in biological information networks.In recent years,the research of complex network theory has been developing.With the help of this theoretical tool,we can study various large and complex systems,and solve the problems including node centrality,community detection,network communication and so on in complex systems.Essential proteins are usually regarded as important nodes in protein net-works.The direction of studying essential proteins can be from node centrality and fusion of biological information in complex network theory.Node centralities are measure indicators of the relative importance of each node in the network,which include degree centrality,between-ness centrality,subgraph centrality,eigenvector centrality and so on.Fusion of biological infor-mation refers to the enrichment of proteins by fusion of biological information for organisms,such as complex information,subcellular localization information,gene expression,etc..The complex information is a high cohesion and low coupling module composed of proteins,which can cooperate with each other to complete specific biological functions.Subcellular localiza-tion is used to accurately locate the cell type in which each protein is located.In this thesis,based on the topology of protein interaction network,we will use complex network theory and fuse protein complex,subcellular localization information to study the recognition algorithm of essential proteins.This article mainly focuses on three aspects:1.Two methods are proposed by fusing complex information to identify essential proteins.In the whole network,the factors that affect the importance of nodes are not only the nodes themselves,but also the neighbor nodes of nodes.The topological structure of nodes in the complex is also an crucial factor.These two methods combine the local and global topological characteristics of a node in the network,and also consider the structural characteristics of the node and its neighbor nodes in the complex,being the comprehensive properties of the protein.Based on the methods,each node is ordered in descending order which can obtain the predictive essential protein.These two methods called CDC and CIBD,which can effectively reduce the influence of data noise from a single data source on the accuracy of prediction,improving the recognition accuracy of essential proteins in protein interaction networks.And they also solve the problem that biological experimental methods are expensive and time-consuming.2.The mixed clustering coefficient centrality and extended information centrality are pro-posed to identify essential proteins.In the network topology,the clustering coefficient is an important method to judge the importance of nodes.Therefore,the clustering coefficient is applied to the importance of identifying essential proteins.In the whole network,the cluster-ing coefficient of each node is uniquely determined,but in different complexes,the clustering coefficient of nodes is particular different.Therefore,mixed clustering coefficient centrality ex-tends the traditional clustering coefficient centrality in complex networks.First,the clustering coefficient of nodes in the complex is defined.Then the comprehensive measurement for nodes is defined by combining the nodes,edges clustering coefficient in the network and the complex clustering coefficient,called CENC.An extended information centrality is also proposed to identify essential proteins.This method aims to better identify essential proteins within the complex.First,a new method is defined to find important nodes in complexes.Second,the fre-quency of the complex is integrated.The node score is obtained by combining the centrality for the whole network topology and the method in complexes,which is named as the EIC method.These methods both can improve the accuracy for predicting essential proteins.3.A centrality is proposed which fuses subcellular and complex information to identify essential proteins.Based on the complex information,this method adds subcellular localization biological information which can enrich the information of protein nodes.The importance of proteins in subcellular localization can be affected by the frequency of node appearance in subcellular localization and the importance for each cell.Through the frequency of nodes and its neighbors appearing in the complex,the importance of proteins can be obtained.Based on the subcellular and complex information,the algorithm is called subcellular and complex centrality(SAC).The experiment shows that enriching biological information is an effective way to improve the recognition of essential proteins.
Keywords/Search Tags:Protein interaction network, Essential protein, Protein complex, Subcellular location, Assessment method
PDF Full Text Request
Related items