A heterogeneous computing platform for biological sequence database searches

Posted on:2008-06-02

Degree:Ph.D

Type:Dissertation

University:Wayne State University

Candidate:Meng, Xiandong

Full Text:PDF

GTID:1448390005964932

Subject:Computer Science

Abstract/Summary:

Due to the high cost of dedicated parallel computing and supercomputing machines, the High Performance Computing (HPC) for today's enterprise computing infrastructure has emerged as a heterogeneous computing architecture that allows us to integrate the new commercial-off-the-shelf components or innovative implementation through the extension of services. To enable low cost HPC, we have developed a hybrid-computing platform, Wayne Bio-Accelerator (WaBA), for high-throughput biological sequence analysis utilizing the existing enterprise computing infrastructure as well as various general-purpose computer architectures via the network.; WaBA is a heterogeneous computing platform that integrates heterogeneous computer architectures including legacy processors, conventional processors with SSE2 instructions, and reconfigurable coprocessors together into one system, and allows each to perform the task to which it is best suited. Accurate biological sequence database search algorithms like the Smith-Waterman algorithm are the most sensitive, but their high computational complexity limits their use. WaBA effectively accelerates the most sensitive and time-consuming Smith-Waterman biological sequence database search core with dynamic load balancing, data pre-fetching, database and query segmentations, and a series of optimizations.; The WaBA scheduling strategy automatically distributes the workload across multiple heterogeneous processors based on their processing capabilities. An efficient adaptive data pre-fetching scheme was designed for slow IO interfaces like PCI-based reconfigurable computing systems to overlap the communication and computation time. The implementation effectively eliminated a major portion of data access penalty and improved the performance by up to 42%. We also developed a list of WaBA API functions which hide the complexity of hardware programming and data format conversion for seamlessly connecting the WaBA accelerator to the existing biological sequence search tools. Furthermore, the parallel SSE2 implementation obtained a speedup of 143 on a cluster of 16 dual CPU Intel processors as compared to the sequential version that was widely used at the time. Additionally the WaBA The WaBA heterogeneous computing system demonstrated a speedup of 110 by utilizing only one reconfigurable coprocessor and 8 dual core AMD processors. Clearly, the integrated heterogeneous computing architecture can support the data and compute intensive life science applications at low cost.

Keywords/Search Tags:

Computing, Biological sequence, Cost, Processors, Search, Platform, Waba

Related items

1	Sequence and structure similarity search in biological and XML databases
2	Research And Implementation For Similarity Search Algorithm Of Biological Sequences
3	Biological sequence analysis using Hadoop/MapReduce as a distributed computing model
4	Automatically Get To Build The Study Of Biological Information Platform And Sequence Alignment Algorithm Based On Information
5	Research On Algorithm For Similarity Search Of Biological Sequence Database
6	Research And Implementation Of Multi-source Biodata Indexing And Processing Technology Based On Distributed Computing System
7	Research And Implementation Of Index Structure Of Biological Sequence
8	Research On Key Technologies Of Accelerator For Biological Sequence Analysis
9	Research On Key Technologies Of Parallel Optimization For Biological Sequence Analysis Algorithm Based On CPU+GPU Heterogeneous System
10	Algorithms for biological sequence problems