Font Size: a A A

Research On The Mapreduce-Based Algorithm For The Motif Discovery Problem

Posted on:2013-10-25Degree:MasterType:Thesis
Country:ChinaCandidate:S LinFull Text:PDF
GTID:2248330395955637Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Motif discovery is one of the core problems in Bioinformatics, which for the research ofthe gene expression regulation mechanism has a very important significance. Implanted (l, d)motif discovery is a classic model, but the problem has been proved to be NP hard. Motifdiscovery involves a huge amount of computation, so, parallel computing is a good choice.In this paper we introduced MapReduce parallel computing model into motif discoveryto solve the problem of high Computational Complexity in motif discovery. MapReducemodel allows the user to focus more on the problem which needs to solve, it decomposes theproblem into two modules Map and Reduce, and the system automatically parallelizes thecomputation, schedules tasks and nodes communication across large-scale clusters ofmachines.On the basis of full analyzing the implanted (l, d) motif discovery problem, andcombining with MapReduce model’s characteristics, in this paper, we designed an effectiveparalleling algorithm--PMSPMR algorithm. In order to design algorithm, we must considerthe data partitioning, Different data partition methods has great influence for theimplementation of the algorithm. In this paper, we described several kinds of design methodsin detail, After a full analysis and comparison, we designed and implemented PMSPMRalgorithm on Hadoop distributed computing platform. For instances of the problem withdifferent computational difficulty, the experimental results on Hadoop clusters demonstratethat PMSPMR has good scalability.
Keywords/Search Tags:Motif discovery, parallel computing, MapReduce Hadoop PMSPMR
PDF Full Text Request
Related items