Font Size: a A A

Research Of Go Functional Annotation Platform With Homology Search Based On Hadoop

Posted on:2014-06-04Degree:MasterType:Thesis
Country:ChinaCandidate:H Y WuFull Text:PDF
GTID:2268330428458355Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the widespread use of the second-generation gene sequencing technology, the speed of gene sequencing has been greatly enhanced. The consequent mass biological data should be organized and annotated before it has a biological meaning. At present, there are already a large number of biological database used to store and manage these huge information efficiently. And how to annotate new biological data by these annotated data has become an important area of bioinformatics. GO(Gene Ontology)which builds a cross-species annotation lexicon to define the gene function and the relationship between gene function, has been widely used in the annotation.Meanwhile, in the face of massive biological data, how to deal with the data quickly and efficiently by parallel processing, has also become a hot academic research. At present, there are many kinds of processing framework for parallel computing. But the the concept of cloud computing and MapReduce parallel framework published by Google has been widely used in data processing because of its high scalability and ease of use. Hadoop, the open source cloud computing system, which implements the function of Google cloud computing, has been widely used by researchers. This paper proposes and designs the Hadoop-based homology search GO functional annotation platform on the basis of the combination of bioinformatics and cloud computing technology. And the platform can make it convenient to do the study of gene data.The main research works are as flows:(1) This paper has studied the theoretical basis of Gene Ontology, as well as the application of GO ontology in bioinformatics, especially gene function annotation. And the we have done the analysis of the existing genetic data annotation methods, and the theoretical basis of the functional annotation based on the similarity of homologous sequence. (2)This paper has researched the Gene function annotation process based on sequence similarity matching. And then we study about the role of the scoring matrix and the sequence alignment algorithm played in the process of discovery of the homology. We also research and implement the sequence alignment algorithm such as dot matrix, Needleman-Wunsch, Smith-Waterman, and then test and compare their performance.(3) Put forward the architecture of the Hadoop-based gene function annotation platform creatively and then design the local gene annotation database by the integration of the GO databases and other biological databases. And then design the concept model of annotation to achieve the associated path of the ontology and annotation information.(4) Analyze the theory of the protein database search algorithm BLASTP and compare the runtime of the algorithm at each stage. Design the protein parallel alignment algorithm CGABlastP based on the MapReduce parallel processing framework of Hadoop and the requirement of the alignment algorithm on gene annotation. It’s proved by experiments that the algorithm improve the speed of gene annotation essentially and adapt to the needs of the exponential growth of the biological sequence.
Keywords/Search Tags:Gene Annotation, GO(Gene Ontology), Sequence Alignment, Cloud Computing, Hadoop, MapReduce
PDF Full Text Request
Related items