Font Size: a A A

Research On Identifying Protein Complexes Based On Hierarchical Clustering And Gene Ontology

Posted on:2013-06-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y H LiuFull Text:PDF
GTID:2250330392967951Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Protein complexes play a fundamental role in performing cellular functions. There-fore, determining how to efectively identify protein complexes from protein-protein in-teraction (PPI) networks is an important task in bioinformatics.Protein complexes have their inherent topological features including high-densityand core-attachment structure. The identification of protein complexes is to mine sub-graphs which conform the features of complexes from large-scale PPI networks. Thisthesis mainly focuses on tackling the three challenges existing in current identifying com-plexes algorithms. Below is a list of our main research content.First, this thesis transformed the complex identification problem into the communitydetection problem in complex networks. After analyzing the shortage of the classicalGN algorithm and HLC algorithm, this thesis proposed the HLC-CA algorithm whichintegrated HLC algorithm and core-attachment structure. Experimental results showedthat HLC-CA significantly outperforms other state-of-the-art algorithms. Besides, wederived that the time complexity of HLC-CA is relatively low which reveals that HLC-CA is suitable to mine complexes from large-scale PPI networks.Second, this thesis utilized the semantic similarity, which is calculated based on theGene Ontology to evaluate the reliability of PPIs, to solve the high positive problem ofPPI networks. Experimental results showed that the performance of current identifyingcomplexes algorithms is significantly improved on the processed PPI networks. Besides,this thesis combined two PPI networks to prove that tackling the high negative problemof PPI networks can improve the performance of current algorithms.Third, this thesis proposed the SCGO algorithm based supervised learning methods.The SCGO first extracted the biological features from the Gene Ontology to represent pro-tein complexes. Then it built a protein complex classifier using the positive and negativedata sets of complexes. Finally it employed the classifier to filter the results of currentidentifying protein complexes algorithms. Experimental results showed that the SCGOalgorithm can significantly improve the precision of current algorithms.
Keywords/Search Tags:protein complex identifying, hierarchical clustering, Gene Ontology, seman-tic similarity
PDF Full Text Request
Related items