Font Size: a A A

Predicting Protein-protein Interactions And Studying The Related Contents Based On Network Topologies

Posted on:2015-04-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:L YangFull Text:PDF
GTID:1220330479478686Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Proteins are fundamental to the human life and seldom act as a single unit. They always interact with each other in the specific ways to coordinate nearly most cellular processes. The analysis of protein interactions is the basis of understanding cellular organization and molecular functions. A protein interaction network constructed with directly physical interactions can represent a biological system in an explicit way, uncover tissue functions and structures, and identify the pathogenesis of human diseases and drug targets of a gene therapy. High throughput technologies generate numerous protein interactions, which provide the data support for related researches based on protein interaction networks. Using the measurements of network topology associated with molecular functions and protein complexes can easily obtain contents of functions and diseases hidden in a network. At present, there are two main problems in the studying of function modules and disease-related proteins based on protein interaction networks. One is unreliable protein interactions in networks, including noise(false positive interactions) and deficient data(false negative interactions), which result in the deviation of related researches. The other is that currently computational approaches may fluctuate in predicting processes based on various networks derived from data of different character, quality and quantity. Moreover, these approaches need to be improved in the quality and coverage of predictions. All of them will lead to the incompleteness of functional and disease modules in networks.Based on known protein interaction networks, the thesis predicts protein-protein interactions and discovers topological, functional and disease modules based on kernels of cliques(maximal complete subnets). It proposes a framework model of repairing protein interaction networks via predicting reliable protein interactions. Moreover, predicted protein interactions associated with function modules and protein complexes based on the framework. It provides reliable predictions in networks containing false positive and negative interactions. In addition, it assists in identifying the larger size of functional and disease modules close to real modules in networks. The major research contents of the thesis include the following four parts:1. A framework model based on the strategy of Loose In and Strict Out is proposed to predict reliable protein interactions. First, the known computational methods of predicting protein interaction are partitioned into the various levels of the reliability. Then, the framework model integrates the several sub-methods according to the rules of the compatibility and complementation between them. Finally, the predicted protein interaction set is identified with the two processes of the prediction and estimation. Every predicted interaction satisfies with multiple significance of biology. Thus, it is more reliable. The framework model provides the instruction for the subsequent approaches of protein interaction prediction based on protein interaction network.2. Two kinds of approaches with explicit and implicit patterns are proposed to predict protein interactions based on the framework model. With the explicit mode, the two methods of predicting interaction are designed, respectively. The first method is consistent with the standard model. It can predict reliable protein interactions in spite of the interference of noise datasets and adapts for different datasets having various characters of the topological network. The second method is a simplified case applied with the framework. The predictions participate in the new process of predicting interactions, which obtain more predicted interactions and keep the accuracy of predictions. The dominant prediction sets in quality and quantity are obtained by using different rules of gene ontology, respectively. With the implicit mode, the two algorithms are designed according to the different characters of the topological network of the detected protein complexes, respectively. The first algorithm is based on bridge-cut and the second is based on adaptive k-cores pruning. The former aims at some complexes having multiple subnets to obtain high accuracy in the predictions. The latter is robust to most complexes detected by different approaches. Most of predictions are associated with functional modules and protein complexes, which will provide a basis of identifying functional and disease modules in subsequent sections.3. A method of identifying potential cliques based on the competition in a candidate pool is proposed. Besides the basis of extending cliques, it utilizes the hidden relationships between nodes in the candidate pool and proposes greedy rule to select the node of winning the chance of extending cliques. Furthermore, the candidate pool is no longer statically determined and is re-dynamically constructed according to the currently expended clique. And so on, the process is repeated until the finally potential clique is identified. Most unknown protein interactions included in potential cliques can be validated by related test. This illustrates that mined potential cliques are close to real cliques and own better biological significance. The method can overcome the deficiencies of protein interaction networks.4. A method of predicting disease-related proteins is proposed based on clique bone in protein interaction networks. Firstly, a method of extending clique is used to mine potential cliques. Secondly, disease-related cliques are identified according to the significance test of the ratio between disease proteins and normal proteins in a clique. Disease proteins are predicted based on disease-related cliques. Finally, predicted disease proteins are scored by annotations of gene ontology and are identified to be reserved or abandoned. The method well overcomes the interference of false positive and negative data. The precision is good according to the relationship between genotypes and phenotypes. Moreover, predicted disease proteins derived from the cliques associate with each other tightly and relate to complex human diseases, such as various cancers. Thus, they can provide clues to uncovering the pathogenesis of serious diseases.The first item is a model that provides a guiding framework for the thesis. The second item of predicting protein interactions belongs to the repair of networks, which is fundamental to mining network modules subsequently. The third item of mining potential cliques is to identify functional modules in networks. The last item predicts disease-related proteins via applying the preparations of predicted interactions and extended modules in the networks.The thesis solves the problem that topological methods are easy to be interfered by false positive and false negative data. Predicted protein interactions associated with functional and disease modules tightly improve to obtain relatively complete modules and contribute to the more accuracy of studying biological problems.
Keywords/Search Tags:Protein interaction network, protein-protein interaction prediction, clique mining, functional modules, disease protein prediction
PDF Full Text Request
Related items