Font Size: a A A

Identifying Protein Complexes And Functional Modules In Protein Interaction Networks

Posted on:2009-06-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:M LiFull Text:PDF
GTID:1100360278954056Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the post-genome era, one of the most important challenges is to systematically analyze and comprehensively understand how the proteins accomplish the life activities by interacting with each other. Analyzing the characters of protein interaction networks based on the topology structure, identifiying protein complexes and functional modules, and predicting the functions of unknown proteins are becoming the most improtant issues in the domestic and overseas researches.The characters of topology structures in protein interaction networks are studied firstly. Based on the common characters of different specie protein interaction networks, several effective algorithms for detecting protein complexes or functional modules are proposed. The main original works include:Complex network theory and graph technology are applied to the analysis of the topology structure characters in different specie protein interaction networks, such as the dgree distribution, the degree-degree correlation, the network diameter, characteristic path lenghth, edge betweenness, range, and the reliability. Some common characters are detected from these protein interaction networks of different species, which can provide foundation for develping reasonable algorithms of mining protein complexes and functional modules.At present, the available protein-protein interactions are not complete. Only mining maximal cliques are too limited to be used for predicting protein complexes since it is unlikely that all proteins in a large complex can interact with each other. To avoid of the limitation, a new algorithm of identifying protein complexes based on maximal clique extension (IPC-MCE) is proposed, which is easy to be implemented and effective. The algorithm IPC-MCE is applied to the protein interaction network of Sacchromyces cerevisiae and identifies many well known protein complexes. Moreover, algorithm IPC-MCE is not sensitive to the input parameter.Based on our discovery that most of the shortest paths between proteins in complexes are no more than two, we propose a new algorithm IPC-DM for identifying protein complexes on the basis of distance measure. The experiment results show that the algorithm IPC-DM recalls more known complexes than other previously proposed clustering algorithms and has a relatively higher sensitivity, specificity and F-measure. Moreover, the algorithm IPC-DM is robust to the known high rate of false positives and false negatives in data from high-throughout interaction techniques. Thus, the algorithm IPC-DM can be used in protein interaction network even with high false positives and high false nagatives to identify new protein complexes and to provide references for biologists in their research on protein complexes.The hierarchical clustering algorithms based on betweenness are not suitable to be used in large protein interaction networks because they are time consuming. A new local variable of edge clustering coefficient is introduced and a new fast hierarchical clustering algorithm FAG-EC based on it is proposed. To decrease the effect of noisy data on the clustering results, a new algorithm HC-Wpin is proposed for hierarchically clustering in the weighted protein interaction network. The logistic regression-based scheme is used to assign each edge a weight. The edge clustering coeffcient and the functional module in weighted graph are redefined. All the identified functional modules are validated by the three types of annotations of Gene Ontology (GO): Biological Process, Molecular Function, and Cellular Component. The experiment results show that algorithm FAG-EC and algorithm HC-Wpin can not only detect the significant functional modules in protein interaction network but also accurately identify functional modules in hierarchy by changing the values of parameter. Moreover, algorithm FAG-EC and algorithm HC-Wpin are extremely fast, which can be used in even larger protein interaction networks of other higher-level organisms as the protein-protein interactions accumulating sharply.According to the "centrality-lethality rule" generally existing in protein interaction networks, a graph split and reduction model is proposed and a new algorithm OMFinder for identifying overlapping functional modules based on the proposed model is developed. The experiment results show that algorithm OMFinder detect many significant overlapping functional modules. The overlapping rate between different functional modules is about 2. Compared to other algorithms for detecting overlapping functional modules, algorithm OMFinder has better performance and lower discard rate.The clustering algorithms proposed in this paper start off from different sights and solve some problems effectively in the processes of clustering in protein interaction networks. The proposed clustering algorithms not only can be implemented efficiently and have good clustering performances. The identified protein complexes or functional modules are proved to be statistically significant. A number of unknown protein functions are predicted, which can provide some references for biologists in their biochemical experiments. Moreover, the proposed clustering algorithms can be generalized to other complex networks with the similar structures.
Keywords/Search Tags:systems biology, protein interaction network, clustering, protein complex, functional module, prediction of protein function
PDF Full Text Request
Related items