With the development of computer technology, we have entered multicore-era. In order to harness the abundant hardware resources, parallel programming has become more and more popular. However, due to the non-determinism of parallel software and variety of concurrency bugs, writing robust parallel software is notoriously hard. Therefore, how to debug concurrency bugs efficiently has become an issue that needs to be urgently deal with. Unfortunately, many existing software-based approaches usually incur high runtime overhead, while most hardware-based proposals usually focus on a specific type of bugs and thus inflexible to detect a variety of concurrency bugs.In this paper, we propose Hydra, an approach that leverages massive parallelism and programmability of fused GPU architecture to simultaneously detect multiple types of concurrency bugs, including data race, atomicity violation and order violation. Hydra instruments and collects program behavior on CPU and transfers the traces to GPU for bug detection through on-chip interconnect. Furthermore, Hydra exploits three optimizations to achieve high speed and accuracy, which includes:1). Using bloom filter to filter out unnecessary detection traces; 2). Avoiding eviction of shared traces; 3). Only comparing last-write traces for shared data with happens-before relation. Hydra incurs small hardware complexity and requires no changes to internal critical-path processor components such as cache and its coherence protocol, and is with about 1.1% hardware overhead under a 32-core configuration. Experimental results show that Hydra only introduces about 0.18% overhead on average for detecting one type of bugs and 0.46% overhead for simultaneously detecting multiple bugs, yet with the similar detectability of a heavyweight software bug detector (e.g., Helgrind). |