Font Size: a A A

Research On Motif Discovery Algorithms Of Protein Networks And Its Applications In Identifying Essential Proteins

Posted on:2016-05-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:G H LiFull Text:PDF
GTID:1360330473967136Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of high-throughput proteomics,a vast amount of protein interaction data has been collected in mult iple public databases.How to identify the substructure which associated with the biological functions from these large-scale protein interaction networks is one of the current hot research topics in computational biology.Network motifs may represent evolutionary conserved topological units of a protein interaction network.Study of network motifs may help understand the structure,function and evolutionary design principles of protein network,and provide vital means of recognizing the internal tissues and processes about the living system in a cell systematically.Discovering network motifs involves subgraph searching and graph isomorphic detection,both of which are high computational complexity.The needed execution time grows exponentially as the size of networks or motifs increases,thus limiting their applicability.Developing efficient and scalable algorithms for motifs discovery are of great theoretical significance and application value.On the other hand,starting from biological meanings of network motifs,how to remove noise data effectively in biological network and make discovered subgraphs are more biological significance is also a problem.In this thesis,we take real protein networks as research objects and focuses on the research of two types of motif discovery problems from a computational and biological perspective respectively,i.e.struct ural network motif and biological network motif.For the structural network motif,by considering the fact that there are many false posit ives and false negative data in protein networks generated by high-throughput biological experimental techniques,we use structural network motifs refer to non-induced subgrpahs.As for the biological network motif,it is considered to be induced subgraphs because the targe network has removed false positives before searching subgraphs.In addit ion,there may be some internal connections between essential protein and motif.We finally design a new centrality algorithm wit h network motifs,and score proteins in a protein-protein interaction(PPI)network to find essential proteins.The work and contributions are as fellows.(1)Motivated by the fact that any connected subgraph that is non-treelike can be generated by expanding a corresponding subtree through adding edges,the whole subgraph search process are divided into two steps: one is the treelike subgraphs,the other is non-treelike subgraphs.For treelike subgraph,we propose a new algorithm to efficiently count k-size subtrees.Our algorithm is based on the enumeration of all connected sets of k-1 edges,incorporates a labeled rooted tree data structure in the enumeration process to reduce the number of isomorphis m tests required and uses an array-based indexing scheme to simplify the subtree counting method.Fina lly,based on the found subtree,we consider other non-treelike subgraphs by traversing an expansion tree in depth-first manner.The efficiency of our algorithm is demonstrated by comparing the obtained results with the current state-of-the art subgraph counting algorithms.(2)In order to eliminate the need to performany isomorphic detection,we present a new algorithm that incorporates combinatorial techniques to countnon-induced occurrences of subgraph topologies in the form of trees.Our algorithm makes use of combinatorial techniques to count the number of isomorphic patterns instead of listing non-induced subtrees.Furthermore,according to different tree topologies may have the same substructure,we could compute for the number of non-induced occurrences of multiple trees simultaneously and decrease the repeated search for the substructure.Finally,we use the revolving door ordering algorithm which can generate combinations of vertices effic iently to list the substructure.The experiments show that the proposed algorithm is roughly an order of magnitude faster than existing classic approaches.(3)Starting from biological significance of network motifs,we propose a new algorithm for extracting biological network motifs efficient ly,which integrates edge-clustering coefficient and semant ic similarit y of GO terms to evaluate the biological significance of interactions comprehensively.The main idea of the algorithm is to reduce the number of subgraphs to be searched by removing a number of ‘biologically insignificant' edges from the original network and,at the same time,increase the discovery rate for biological network motifs.The proposed algorithm has a good fusion between functional information and topological properties of the networks,which can solve false positives effectively in biological network and make discovered subgraphs have a high proportion in complex and functional module.Experimental results show that our algorithm is applicable to find structural network motifs as well.(4)By considering the facts that motifs represent evolutionary conserved topological units of a protein interaction ne twork and essential proteins are more evolut ionarily conserved than the nonessential proteins,a new algorithm of identifying essential proteins base on subgraph density of motif extension is proposed.A node's essentiality is determined by the sum of the extended subgraph density of motifs in which it participates.Furthermore,because of the fact that some false posit ive information exists in protein interaction networks,we use edge clustering coeffic ient to preprocess these networks and then combine the proposed algorithm to identify essentia l proteins in the modified networks.The experimental results show that the number of essential proteins discovered by our two algorit hms universally exceeds that discovered by the other traditional centrality measures.
Keywords/Search Tags:Computational Biology, Protein Interaction Network, Network Motif, Essential Proteins
PDF Full Text Request
Related items