Font Size: a A A

Research On Protein Complexes Identification Algorithms In Protein-protein Interaction Networks

Posted on:2017-07-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:B W CaoFull Text:PDF
GTID:1360330488477075Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the post-genome era,with the rapid development of high-throughput technology,a vast amount of protein interaction network data has been generated.How to mine the meaning substructure from these large-scale protein interaction networks is one of the main hot research topics in bioinformatics.Protein Complexes may represent the main carrier of the cell component function,which is of great significance to help understand the structure of protein networks,as well as learn the tissues and processes in life activities of cell.In this thesis,the real protein interaction networks are taken as research objects and two types of protein complex discovery problems from an unweighted and weighted network are investigated respectively.For the unweighted networks,the density,size and characteristics path length of protein complex as the objective function is designed to detect biologically meaningful protein complexes.For the weighted networks,considering the fact that the accuracy of the existing algorithms is not ideal,different overlapping protein complex algorithms are designed to improve the existing algorithms.The main work of this thesis are as follows:(1)The majority of algorithms often employ single network topological property to identify protein complexes in protein interaction networks,we propose a novel MultiObjective Evolutionary Programming Genetic Algorithm(MOEPGA)which integrates multiple network topological properties to detect biologically meaningful protein complexes.we first systematically analyzes the multiobjective problem in terms of identifying protein complexes from PPI networks,and then constructs the objective function of the iterative algorithm based on three common topological properties of protein complexes from the benchmark dataset,the next generation subgraph population is designed with an edge mutation of subgraphs.Then,those non dominated subgraphs are regarded as protein complexes,the next generation subgraphs with some an appropriate select probability are generated.Comparing with the existing algorithms(ClusterONE,DPClus,MCODE,NEMO,SPICi and HC-PIN),experimental results show that the proposed method can not only find more protein complexes but also achieve significance results in terms of fscore and normalized clustering score in the DIP and GAVIN dataset.(2)Considering the fact that there are many false positives in protein network generated from high-through experimental techniques and the overlapping protein complexes are difficult to be identified.We propose a novel method called FK-Medoids that can detect overlapping protein complexes based on the substructures(pseudo-cliques)extension.Based on the fuzzy relation,we first detect non-overlapping pseudo-cliques with higher density and similarity from protein interaction networks,and then extend the pseudo-cliques with the improved K-Medoids algorithm,which employs the edge similarity as the distance between protein pairs,the overlapping protein complexes are detected with the aid of the upper and lower approximations of sets.Finally,we merge those overlapping protein complexes and filter those protein complexes with density less than 0.01.The proposed method is compared with several state-of-the-art algorithms in three datasets,which are Krogan,Gavin and Collins datasets,respectively.The experimental results show that the proposed method outperforms the other algorithms in Krogan and Gavin datasets according to the evaluation metrics of precision,F-measure and sensitivity.(3)The performance of FK-Medoids in detecting protein complexes in low density protein interaction networks is generally good,whereas in dense networks its performance is relatively poor.In addition,the running time of FK-Medoids is too long.We propose a novel algorithm called PCE-ONNFR to detect protein complexes based on pseudo-clique extension.First,the weights of PPI networks is constructed with the similarities of protein pairs PCE-ONNFR then forms the non-overlapping protein sub-structure based on fuzzy relation and expands each sub-structure by adding neighbor proteins to maximize the cohesive score.Finally,highly overlapped candidate protein complexes are merged to form the final protein complex set.Comparing with FK-Medoids,PCE-ONNFR algorithm is more efficient in the datasets of DIP,Collins,Krogan and Gavin.We further apply PCE-ONNFR to a human protein interaction network and demonstrate it is very effective in detecting protein complexes compared to the existing overlapping protein complex algorithms.(4)By considering the facts the majority of multi-information fusion methods need to adjust parameters with manual intervention,a new algorithm called cwMINE to identify protein complexes is proposed.The proposed algorithm integrates edge clustering coefficient and GO semantics of protein pairs by the means of harmonic mean to compute the edge weigh.And,the density of subgraphs based on the edge weights and the vertex weights are calculated,respectively.To further improve the prediction precision of protein complexes,a new expanding rule named expanding coefficient is proposed,which is embedded in the protein complex prediction process to filter low weight proteins.The proposed algorithm is validated on three yeast protein interaction networks from Krogan,Gavin and Collins datasets.The experimental results show that the proposed method outperforms the other algorithms(CMC,RRW,ProRank+ and HC-PIN)in most datasets in terms of the evaluation metrics.We further validate the effectiveness of our method on a human protein interaction to identify important disease-related Protein Complexes and provided valuable indications for disease treatment.
Keywords/Search Tags:Protein-Protein Interaction Network, Protein Complex, Multiobjective Evolutionary Programming, Fuzzy Relation, Weight
PDF Full Text Request
Related items