Font Size: a A A

Research On Algorithms For Identifying Protein Complexes Based On Protein Network

Posted on:2016-01-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q G DaiFull Text:PDF
GTID:1108330479478719Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Protein is a kind of very important biological molecules. Biology studies have shown that protein rarely participates in the activities of life alone, but by the physical interactions between several proteins to form a large molecular structure, named protein complex. Complex is the main form of proteins to perform their functions. Many important biological processes in a cell are performed by protein complexes. Therefore, accurately recognizing protein complex in a cell is of great significance for revealing protein activity and understanding the function of the protein. Protein network is a kind of biological network to describe interactions between proteins. Using the computational method to identify protein complexes from protein network, is one of the current research hotspots in the field of bioinformatics. Therefore, this thesis focuses on the topic of complex detection based on protein network, and studies several different kinds of methods including local search, discrete and continuous optimization and the one based on temporal protein network. The concrete research contents include the following four aspects:(1) The label propagation algorithm for detecting protein complexes.In the aspect of local search method, we puts forward an algorithm that detects protein complex by propagating labels on protein network. Label propagation mechanism is introduced in this method, which identifies network modules corresponding to complexs by the process of propagating labels through interacting proteins. Specifically, according to the characteristics of the protein complex, some improvements are embodied in the algorithm, including the following several aspects: employing multi-label storage and propagating mechanism to solve the problem of overlapping between protein complex; defining the propagating intensity of labels to emphasize the importance of common neighbors of interacting protein pairs, and thus to improve the efficiency of propagation; using an adaptive threshold label update strategy to control the scale of complex at a reasonable level; adopting the label update order based on protein degree to enhance the robustness of the algorithm. Experimental results show that the proposed algorithm has certain advantages in terms of the identification of protein complexes. The proposed algorithm provides a new and effective local search method to solve the problem of identify complexes from protein network.(2) The complex detection method based on a discrete modularity function.Modularity function as a quality measurement function of modules in a network, is important to guide clustering merging in hierarchical clustering. In allusion to the overlapping and small features of protein complex, this paper puts forward a novel modularity function, which is used to measure the quality of module partition corresponding to protein complexes in a network. Compared with the traditional function, the new modularity function have the characteristics of following two aspects: on the one hand, it has stronger ability of description on the overlapping modules; on the other hand, it could avoid the resolution limitation, thus more suitable for smaller modules as protein complexes. In addition, in order to further improve the efficiency of the hierarchical clustering, an initial module selection method based on the degree of correlation is also proposed. Based on the above function and initial module selection, this dissertation presents a complex detection algorithm. The effectiveness of the proposed algorithm is verified by extensive experiments, which shows that the method is more suitable for solving the problem of protein complex detection. This work is significant for the related research employing complex modularity function.(3) The continuous optimization model and algorithm for complex detection.In terms of continuous optimization method, to describe the relationship between input protein network data and unknown complex division, a least squares method based model for complex detection is put forward in this dissertation. It also introduce several other strategies such as weighting protein interaction, penalizing term to further improve the description ability of the model. On the basis of the model, multi-scale method and multiplicative update rule are both used to fit the model with a given input protein network, thus to speculate protein membership to different complex. Extensive experimental tests and comparisons with other similar algorithms prove the validity of the model and algorithm.(4) Complex identification method based on temporal protein network.In terms of the identification method based on temporal protein network, the key is how to make use of gene expression data to construct temporal protein network, which could objectively describe dynamic activities of proteins. The existing relevant methods generally assume that all proteins are dynamic. However, in addition to dynamic protein in the cell, it also contains proteins which the abundance static is relatively stable. Therefore, a new method which constructs temporal protein network based on the idea of dynamic-static protein mixture, is proposed and applied to solve the problem of complex identification. This method not only considers the interactions between dynamic proteins, and at the same time pays attention to the interactions between dynamic and static proteins. Extensive experimental results show that the temporal protein network builted by the proposed method can improve the accuracy of the identification of protein complexes.To sum up, this dissertation focused on complex detection based on protein network, and launched the research from different angles. Several different kinds of algorithms for complex detection are proposed, including algorithms based on label propagation, protein modularity function, least square method and temporal protein network. This work has important research significance and potential application value both for the research of biological network and proteomics.
Keywords/Search Tags:protein network, protein complex, protein-protein interaction, label propagation, hierarchical clustering, least sqaure method
PDF Full Text Request
Related items