Font Size: a A A

Data-Driven Research On Algorithms Of Protein Complex Detection In Protein Interaction Networks

Posted on:2020-01-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:J WangFull Text:PDF
GTID:1360330578972959Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As one of the research hotspots in bioinformatics,detecting the protein complexes in protein-protein interaction(PPI)networks not only has great scientific significance for the analysis of relational data,the characteristic analysis of network structure,and the exploration of life activities,but also plays an important role in the fields of protein function annotation,disease analysis and drug design.Many existing computational approaches for protein complex detection have focused on subgraph mining at the methodological level.With the development of complex network feature analysis and the further understanding of protein complexes,more challenges in protein complex detection algorithms have been put forward and need to be further studied,due to the characteristics of protein interaction network(such as small world,scale-free)and protein complex features(such as overlapping and small-scale).The thesis focuses on protein complex detection in PPI networks considering the characteristics of PPI networks and protein complexes,and to study effective detection algorithms for protein complexes from different perspectives.The main contributions are as follows:(1)For the overlapped and small-scale protein complexes,a complex detection algorithm based on flow simulation is proposed.Based on the network flow theory,the edge capacity and node importance measurement methods are presented based on the node's direct neighborhood information.The flow process in a network is simulated based on the local connection of the network so that the connected regions of the network can obtain the flow from different sources.By referring to the linear threshold model in information propagation,the determination conditions of clusters are designed.The experimental results show that the algorithm can find overlapping clusters and small-scale clusters at the same time,and can detect protein complexes effectively.It provides a new idea for the realization of flow simulation in the design of complex detection algorithm.(2)Aiming at the overlapping characteristics of protein complex and the assortativity used to describe the trend of links in protein interaction network,a novel overlapping complex detection algorithm based on assortativity is proposed.The algorithm measures the node importance within the second-order neighborhood of nodes.By introducing network assortativity,multiple candidate nodes are added in the clustering process.In order to evaluate the accuracy of overlapping protein complex prediction results,an evaluation index of the accuracy of overlapping relations between clusters is proposed.The algorithm establishes the relationship between network feature analysis and network module discovery.The experimental results show that the proposed algorithm can detect overlapping protein complexes effectively.(3)In view of the structural characteristics of protein complexes with the dense center and sparse periphery,a seed-expansion algorithm base on multi-information fusion is proposed by utilizing the abundant information of local structural properties in protein interaction network.This algorithm uses a linear combination model to conduct weighted fusion of multiple network structure information,and then node metric is given within a node's k-neighborhood.A probability model is applied to seed selection to improve the effective use of structural information and realize the diversity of predicted results.The information of cluster density and peripheral connections are used to characterize the cluster structure with the dense center and sparse periphery.Experimental results show that the new seed extension algorithm has a good performance for protein complex detection.(4)Considering the rich topological characteristics of protein interaction networks,such as the small-world phenomenon and the scale-free power-law,a protein complex detection algorithm based on multiple topological characteristics is proposed.Through correlation analysis,the algorithm introduces network characteristics into nodal metric in the kneighborhood.Combining scale-free power-law distribution and node degree,a description model of the cluster with a dense center and sparse periphery is constructed and analyzed.The use of multiple network characteristics further enhances the relationship between network characteristic analysis and network module discovery.The experimental results show that the proposed algorithm can detect protein complexes effectively.In this thesis,corresponding and effective protein complex detection algorithms are proposed,according to the characteristics of protein interaction networks and protein complexes,combined with the defects and shortcomings of existing protein complex detection algorithms.These contributions have not only enriched the analysis methods of network data,promoted the combination and development of computer science and biology,but also provided application support for essential proteins identification,functional annotation,disease analysis,and other issues.
Keywords/Search Tags:Protein Interaction Network, Complex Detection, Graph Clustering, Flow Simulation, Seed Extension Method, Network Topological Properties, Network Characteristics
PDF Full Text Request
Related items