Distributed search of biological databases using Hadoop/MapReduce |
Posted on:2016-07-11 | Degree:M.S | Type:Thesis |
University:Morgan State University | Candidate:Fashola, Babatunde Olaide | Full Text:PDF |
GTID:2478390017977020 | Subject:Bioinformatics |
Abstract/Summary: | |
The main goals of this thesis research were to: 1. Make a computational platform/environment for thesis research. 2. Develop a MapReduce search algorithm that employs the scalability of a Hadoop cluster and the MapReduce functionalities to make the search of a biological database faster. 3. Implement the MapReduce search algorithm using the Java programming language, and running the consequent Java application in a Hadoop multi-node cluster in the cloud. 4. Compare execution times of - The MapReduce search program - The serial search programs -- Boyer-Moore Algorithm and Knuth-Morris-Pratt Algorithm.;13 GB of downloadable GenBank data was processed over the Hadoop framework installed on a 12-node cluster comprised of the Amazon EC2 t2.micro instance types. The execution time of the distributed search program is 46% faster than the execution times of the serial programs. Hence, the present search algorithms used for accessing the biological databases can incorporate the MapReduce programming model to improve their performances. |
Keywords/Search Tags: | Search, Mapreduce, Biological databases, Hadoop |
|
Related items |