Font Size: a A A

Research On Methods Of Protein Function Prediction Based On Protein Sequence And PPI Network

Posted on:2017-05-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z X TengFull Text:PDF
GTID:1220330503469630Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Protein function prediction is one of the most challenging tasks of bioinformatics in the post-genomics. Now, a large number of protein amino acid sequences(short for “sequence”) and protein-protein interactions(PPIs) are accumulated, which provide basic conditions for understanding protein function. In this thesis, the computational problems including predicting protein functions based on protein sequence, measuring functional similarity between proteins, constructing PPI network and predicting protein functions based on PPI network are intensively investigated based on the protein sequences and PPIs. The main contributions of this thesis include the following four parts:(1) A method is designed based on the domain composition of protein sequence for predicting functions of proteinsTraditionally, the computational approaches of predicting protein functions usually need more biological information about proteins beyond sequences, so they can not predict functions of proteins whose unique known information is sequence. Domain is the conserved segment of protein sequence, and the basic functional, structural and evolutional unit of proteins. Domains are commonplace in proteins and domain composition of protiens can be easily acquired. Thus, a method is proposed for predicting protein functions based on the domain composition of proteins. Firstly, association between domain and GO term is mined and the relevance between them is measured by symmetrical conditional probability. And then, the association between domain and term is extended according to the semantic relationship of terms. Finally, on the basis of the associations of domains and terms, the functions of proteins are predicted according to domain composition of them. Compared with the concerned methods, our method has higher precision and recall. Additionally, our method can predict functions of proteins just using their sequences and can not limited by other biological information.(2) An approach based on semantics of GO terms is putforward for measuring functional similarity between proteinsGO terms are widely used to describe functions of proteins. It is very beneficial for predicting protein function and transferring functional information among proteins that measuring functional similarity between proteins by c omparing semantics of GO terms. Thus, many efforts have been made on measuring functional similarity between proteins based on the semantics of GO terms. However, the traditional methods always neglect semantic overlap between terms and provide misjudgments on protein functional similarity. To settle this problem, a novel method is put forward to measure functional similarity between proteins. Firstly, the semantic information content of a GO term is estimated according to their semantic specificity and coverage; and then, the semantics of a GO term is divided into inherited semantics and extended semantics, and semantic information content of a GO term set is measured based on the inherited semantics and extended semantics of the terms; finally, the semantic overlap ratio between GO term set of proteins is regarded as their functional similarity. Compared with the tradtional methods, our method achieves more accutate results and provide more reliable judgement on functional simialrity between proteins.(3) A method based on domain-domain interactions are developed to construct PPI networkProtein performs functions by collaborating with other proteins. PPI network gives chance to understand protein functions at a system level. However, there are a large rate of false negative and false positive PPIs in the network. Because domain-domain interaction(DDI) can mediate interaction between proteins, a new approach is designed to construct a PPI network based on DDIs. Firstly, some DDIs are detected from a lot of combinations of co-occurrence domains within proteins; and then, some potential PPIs are predicted to enlarge the original network based on the DDIs; after that, the PPIs in the enlarged network are reanalyzed to find DDIs across proteins; finally, the reliabilities of PPIs in the enlarges network are reestimated. Compared with the tradtional methods, our method can construct more complete and reliable PPI network. Our method can be used to construct a new PPI network and reconstruct an old PPI network.(4) An approach to predict protein functions is proposed based on PPI networkStudying protein functions on the PPI network can give a comprehensive understanding of protein functional mechanism. So predicting protein functions based on the PPI network has become a hotspot of biological researches. It is believed that interacting proteins share same functions in the traditional method s. In fact, the functions of interacting proteins are very likely to be different. Thus, a method is developed based on the PPI network. Firstly, the relationship between interacting proteins is abstracted as active-passive relationship; according to the relationship, the GO terms on the same ture path are selected to annotate interacting proteins; finally, an iterative algorithm is designed to predict functions of the proteins in PPI network. Our method performs better than the other concerned methods in terms of the metrics: precision, recall and F-measure when they are applied to predict protein functions.
Keywords/Search Tags:protein sequence, prediction function, gene ontology, semantic similarity, protein-protein interaction network, domain-domain interaction
PDF Full Text Request
Related items