Research On Resources Scheduling For Irregular Applications On Graphics Processing Units

Posted on:2014-11-01

Degree:Doctor

Type:Dissertation

Country:China

Candidate:S Mu

Full Text:PDF

GTID:1228330452453597

Subject:Electronic Science and Technology

Abstract/Summary:

PDF Full Text Request

Recently, GPUs (Graphics Processing Units) have been widely adopted in manyscientific and engineering applications, such as graphic and image processing, scientificcomputing, multi-media applications, data mining, financial computing and so on.GPUs are inherently suitable for regular applications as it follows the SIMD (SingleInstruction Multiple Data) execution model. However, the irregular patterns that arepervasive in computation and memory operations have become the performancebottleneck of GPU applications. Such irregular patterns as unbalance workloads,divergent control flow, irregular memory access and poor data locality are exhibited inalmost all aspects of computer architecture design. Therefore, it is critical to minimizethe overhead of processing such irregular patterns for better performance. This workaims at solving the abovementioned obstacles from the perspectives of both designingefficient algorithms and optimizing micro-architectures. The contributions of this thesisare as follows:(1) We analyze and optimize three irregular applications: sparse matrix vectorproduct (SMVP), string matching and QR decomposition. For SMVP, a technique isproposed to eliminate irregular memory accesses by expanding the vector. For stingmatching, we devise two efficient techniques, data partitioning and data reordering, tosolve the irregular computation and memory access patterns simultaneously. For QRdecomposition, we exploit the pipelined parallelism by considering the inherent datadependence. Our techniques exhibit superior performance improvement, with anaverage speed-up over the CPU implementation by one order of magnitude.(2) We conduct a systematical analysis on the characteristics of GPU programs.Our analysis proves that the irregular patterns cause low utilization of GPU resources.On one hand, the unbalanced memory access latency introduced by the memory latencydivergence result in the under-utilization multiprocessors. On the other hand, currentcache management cannot adapt to the complex memory access patterns. Therefore,GPU programs cannot fully exploit the cache resources.(3) We develop a cache management policy called Effective Address BasedPriority and a memory scheduling policy called Divergence Aware Memory Scheduling, respectively. These two microarchitecture techniques can improve the cache efficiencyand reduce the impact of memory latency divergence concurrently. Experimental resultsshow that the cache miss rate can be reduced by20%and system performance can beimproved by30%.(4) For the unbalanced task workloads existed in streaming processing, we developa dynamic resources scheduling policy. Under such a policy, the workloads of each taskare monitored and the amount of data transferred between different tasks will becalculated. Therefore, the computation and cache resources can be allocated to each taskin a dynamically tuned manner. Experimental results show that our dynamic schedulingpolicy can improve the system performance by20%compared to current GPUpreemptive scheduling.

Keywords/Search Tags:

GPU, SIMD, Irregular, Cache Management, Memory Scheduling

PDF Full Text Request

Related items

1	Memory Optimization On Chip Multi-core Processors
2	Supporting Applications Involving Dynamic Data Structures and Irregular Memory Access on Emerging Parallel Platforms
3	Researches On On-chip Parallel Data Access Techniques For SIMD DSPs With Very Wide Data Path
4	Design And Implementation Of SIMD Unaligned Memory Access Structure
5	Research On Auto-Vectorization Compiling Techniques Oriented To Irregular Applications On SIMD Extension
6	Towards Adaptive Cache Management For Dataflow Computation With Memory Resource Constraints
7	Research On Cache Optimization Mechanism In Heterogeneous Memory Environment
8	Design And Implementation Of Distributed Cache Management System For In-memory Columnar Database
9	Efficiently and Transparently Maintaining High SIMD Occupancy in the Presence of Wavefront Irregularit
10	Research On Stream Program Based On Dynamic Rate Edges Task Scheduling And Cache Optimization