Font Size: a A A

The Implementation Of 3 Protein GO Annotation Methods Based On Linux Cluster

Posted on:2009-09-23Degree:MasterType:Thesis
Country:ChinaCandidate:F YeFull Text:PDF
GTID:2120360275472435Subject:Bio-IT
Abstract/Summary:PDF Full Text Request
After obtaining large amounts of experiment data, the next important task is to extract useful information from these data for the analysis of biologist as quickly as possible. To solve this problem, the construction of high performance bioinformation analysis platform is required. Protein Function Prediction is one of the main study areas in the post genome era. GO(Gene Ontology) is a set of dynamic controlled vocabulary, its structure is DAG (Directed Acyclic Graph). The gene ontology precisely describe the protein function and the relationship between function, is widely used in protein function annotation.This study predict protein GO function by the following 3 methods:(1) Homology search based on blast alignment program (blastp, psi-blast), extract the UniProt Keywords from blast computing results, and map the keywords to the GO term.(2) Protein model, family and structure field search based on an integrated project InterPro , extract GO from the computing results of InterProScan.(3) Protein sequence characteristics and physics-chemical characteristics classification based on a software GOKey which implement SVM (Supported Vector Machine), extract GO from the computing results of GOKey.The program and database resources this study concerns include UniProt, RefSeq, , InterPro, Ensembl. Now we complete the GO annotation of Ensembl novel protein database, and provide the web query and protein information presentation. To construct an automated annotation platform, we finished the installment and auto-update of blast program, InterProScan, GOKey and the alignment database under Linux Cluster. Provided a web interface for those tools and computing results. To fully utilize the Linux Cluster's parallel computing ability, the web interface implemented the submitted task's division. The test shows that the parallel computation ability of cluster quickens the computing time spent by the protein GO function annotation methods.
Keywords/Search Tags:Protein Function Annotation, Gene Ontology, Linux Cluster, Parallel Computing
PDF Full Text Request
Related items