Font Size: a A A

Research On Protein Interaction Prediction And Protein Function Prediction

Posted on:2010-02-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q S NiFull Text:PDF
GTID:1100360305473657Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Protein-protein interactions play an important role in many cellular processes. Studies on protein-protein interactions and protein functions will facilitate the understanding of life activities, clinical therapeutics, and pharmaceutical design. Recent advances in high-throughput experimental technologies have generated enormous amounts of data and provided valuable resources for studying protein interactions. However, these technologies suffer from high error rates because of their inherent limitations; moreover, the mechanism of protein interactions is complex, which is a challenge to bioinformatics research. How to determine the interactions between proteins effectively, construct the whole protein interaction networks, and annotate protein functions accurately have become problems which need be solved as soon as possible. Focusing on the topic of protein-protein interactions, this dissertation refers to the study of protein interaction prediction and protein function prediction from protein-protein interactions. The main contents and creative contributions of the dissertation are summarized as follows:(1) The research on methods for protein interaction prediction based on sequences. We investigate the feature encoding scheme of protein pairs and classification method involved in protein interaction prediction by using sequence features. Firstly, a new sample encoding scheme, named symmetrical encoding scheme (SYES), for protein pair is developed by which a single protein-protein pair is mapped to two symmetrical points in the sample space. SYES can fully utilize the feature information of each protein in protein pairs and improve the prediction performance. Secondly, two pattern classification methods for protein interaction prediction are improved. In other words, Kernel method is coupled with k-local hyperplanes (HKNN) to develop a new method, kernel k-local hyperplanes (KHKNN), to predict protein-protein interactions. Moreover, a new local support vector machine (LSVM) method is presented to predict the protein-protein interactions, where the local property of the protein-protein interaction data was taken into account, and support vector machines were constructed near the test samples. The two methods provide new solution for protein interaction prediction.(2) The research on methods for domain interaction prediction. Domains are structural and functional building blocks of proteins and proteins interact with one another through their specific domains, which make it important to identify domain interactions for understanding protein interactions at the domain level. A new model, named support-oppose model (SO), is proposed to predict domain interactions, where each domain pair is assigned with two scores evaluated by support model and oppose model respectively and the two scores are fused to determine possibility of domain interactions. Experimental results from a large scale protein interaction dataset have demonstrated that SO model is a useful method for the prediction of domain interactions, which provide a new way to predict domain interactions.(3) The research on the effect of the quality of the interaction data on predicting protein function from protein-protein interactions. Protein interaction samples obtained from experiments have different qualities, and the traditional methods treat each interaction sample equally when predicting protein function from these data, where the qualities of the interaction samples are seldom taken into account. In this dissertation, we investigate the effect of the quality of protein interaction data on predicting protein function. Moreover, two improved methods, weight neighbor counting method (WNC) and weight chi-square method (WCHI), are proposed by considering the quality of interaction samples with the neighbor counting method (NC) and chi-square method (CHI). WNC and WCHI can make use of the quality of protein interaction data effectively and reduce the negative effect of the errors in the interaction data, which can improve the prediction performance markedly.(4) The research on methods for protein function prediction from protein interaction networks. Proteins interacting with each other are likely to share same or similar functions, which makes it possible to deduce functions for unknown proteins from protein interaction networks. A new general global optimal framework (GGOF) is presented to predict protein function from protein interaction networks, where the function similarities between proteins, which is a little far from each other in the protein interaction networks, are considered. In GGOF, we define an open objective function, and present the general process to solve the minimization of the objective function. Moreover, a new protein function prediction method based on GGOF and random walk with restart (GGOF-RWR) is proposed, and the experimental results have shown that GGOF-RWR shows better performance than, or at least comparable with, some of the previously developed methods. Furthermore, Logistic regression is used to predict protein functions, and the feature selection techniques are studied for improvement of prediction performance. The experimental results have shown that Logistic regression can predict protein function effectively, and the feature selection can not only reduce feature dimension and eliminate redundancy but also detect the relationship between functions in the protein interaction networks and improve the prediction performance.
Keywords/Search Tags:Proteomics, Protein-protein interactions, Domain-domain interactions, Protein function prediction, Machine learning, Data quality, Kernel method, Logistic regression
PDF Full Text Request
Related items