Distributed search of biological databases using Hadoop/MapReduce

Posted on:2016-07-11

Degree:M.S

Type:Thesis

University:Morgan State University

Candidate:Fashola, Babatunde Olaide

Full Text:PDF

GTID:2478390017977020

Subject:Bioinformatics

Abstract/Summary:

The main goals of this thesis research were to: 1. Make a computational platform/environment for thesis research. 2. Develop a MapReduce search algorithm that employs the scalability of a Hadoop cluster and the MapReduce functionalities to make the search of a biological database faster. 3. Implement the MapReduce search algorithm using the Java programming language, and running the consequent Java application in a Hadoop multi-node cluster in the cloud. 4. Compare execution times of - The MapReduce search program - The serial search programs -- Boyer-Moore Algorithm and Knuth-Morris-Pratt Algorithm.;13 GB of downloadable GenBank data was processed over the Hadoop framework installed on a 12-node cluster comprised of the Amazon EC2 t2.micro instance types. The execution time of the distributed search program is 46% faster than the execution times of the serial programs. Hence, the present search algorithms used for accessing the biological databases can incorporate the MapReduce programming model to improve their performances.

Keywords/Search Tags:

Search, Mapreduce, Biological databases, Hadoop

Related items

1	Biological sequence analysis using Hadoop/MapReduce as a distributed computing model
2	The Research And Application Of Search Engine Based On Hadoop
3	Design And Implementation Of Vertical Search Engine Based On Hadoop
4	Research On The Performance And Optimization Of MapReduce Model In Hadoop Platform
5	The Mapreduce Model In The Hadoop Implementation Of Performance Analysis And Optimization Improvements
6	Sequence and structure similarity search in biological and XML databases
7	Design Of Mapreduce Task Scheduling Algorithms In Heterogeneous Hadoop Cluster
8	The Research Of MapReduce Job Scheduling Algorithm Based On The Hadoop Platform
9	Research And Implementation Of Distributed Web Crawl Based On Hadoop Architecture
10	The Performance Optimization And Improvement Of MapReduce In Hadoop