Font Size: a A A

ECPF:An Efficient Algorithm For Expanding Clustered Protein Families

Posted on:2018-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y ZuoFull Text:PDF
GTID:2348330515996680Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the quick development of gene sequencing technology,the explosion age of protein sequences already comes to us.How to deal with a huge number of protein sequences is a matter of concern.An effective solution is to cluster homologous sequences into separated protein families.Those proteins that are affiliated to the same protein family share the similar structure and/or the functionality of genes.The well-known proteins facilitate to identify various valuable evidences for uncovering the unknown proteins till now.But the well-known clustering algorithms can barely fulfill the requirements for actual use.This paper will try to extend the results generated by the traditional algorithms.We first introduce the Si Li X and MCL algorithm,which are typical for clustering protein sequence into familied.We also analyze the advantages and disadvantages of these two algorithms.Then,we present an efficient and effective algorithm called ECPF(Expanding Clustered Protein Families),which could skillfully optimize the clustered protein sequences.The results show that ECPF can discover the unknown connections between families in large-scale databases while expending acceptable overhead of computational time and storing space.We also run ECPF with the “GOLD” standard database,the results generated by ECPF share the most similarity with given results of “GOLD”.ECPF successfully expands the protein sequence network,and creates a more practical protein sequence topology for promoting biological research.
Keywords/Search Tags:cluster, protein family, protein link, expanded family, similarity score, protein sequence, protein function prediction
PDF Full Text Request
Related items