Font Size: a A A

Multi-softcore architectures and algorithms for a class of sparse computations

Posted on:2011-09-24Degree:Ph.DType:Dissertation
University:University of Southern CaliforniaCandidate:Wang, QingboFull Text:PDF
GTID:1448390002961662Subject:Engineering
Abstract/Summary:
Field-programmable gate array (FPGA) is a representative reconfigurable computing platform. It has been used in many applications to execute computationally intensive workloads. In this work, we study architectures and algorithms on FPGA for sparse computations. These computations have unique features: (1) the ratio of input and output operations to computation is high and (2) most memory accesses are random with little or no data locality, which leads to low memory bandwidth utilization.;We propose Multiple Application Specific Softcore architecture to overcome the performance hurdles that are inherent to sparse computations. We identify the critical issues, demonstrate our solutions, and validate the proposed architecture using two case studies: large dictionary string matching and breadth-first search on a graph. Our architecture utilizes multiple application-specific processing units (softcores) to exploit the potential thread-level parallelism in these computations. To alleviate the impact of long latency from accessing external memory on system performance, a specialized memory architecture and a scheduling mechanism are devised to reduce the number of accesses to external memory and to hide the effects of the remaining accesses. By utilizing customized interconnects which are adaptive to communication demand, flexible and efficient inter-softcore data exchange and synchronization mechanism are well supported.;The two kernels in our study are among the most common sparse computation algorithms and are of practical significance on their own. String matching searches for all occurrences of a set of patterns (the dictionary) in a string of input data. It is the core function of search engines, intrusion detection systems (IDS), virus scanners, and spam and content filters. In our study on large dictionary string matching, our design achieved a throughput comparable to implementations on state-of-the-art multi-core computing systems. Breadth-first search is a fundamental building block for many graph algorithms, with applications in network analysis, image processing, and database query. Breadth-first search is a difficult kernel to parallelize on cache-based multi-core systems due to its fine-grained random data access and synchronization between threads. We demonstrate that, by using a message passing multi-core architecture with a distributed barrier design, high throughput performance can be obtained using a modest amount of logic resources on FPGA.
Keywords/Search Tags:FPGA, Architecture, Sparse, Computations, Algorithms
Related items