Font Size: a A A

Technologies For Energy-efficient And High-Performance GPGPU Computing

Posted on:2016-10-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:T ZhangFull Text:PDF
GTID:1108330503493767Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
High-performance computing(HPC) is a key fundamental technology for both scientific and engineering areas. It focuses on the research and development of computer architectures and software technologies that are related to high-performance computers. Technologies of HPC have been widely applied to various domains such as cloud computing, Big Data processing and Internet of Things to enlarge experiment scale, improve efficiency and achieve breakthroughs. Rankings like the Top500 supercomputers are the embodiments of the national HPC research and even the comprehensive national strength of countries. Depending on the hardware, high-performance computer systems can be categorized into homogeneous systems and heterogeneous systems. Heterogeneous systems are those use hybrid types of processors such as CPUs and GPUs(Graphics Processing Units). Such systems are advantageous on performance and energy efficiency. HPC technologies based on CPU & GPU heterogeneous systems are called GPU HPC, which is one of the hot research topics. The research in this dissertation focuses on improving the energy efficiency and performance of GPU HPC and extending its applications. Novel hardware and software technologies are proposed on three research levels: computer architectures, system software, and applications.In this dissertation, the first part of the work proposes an energy-efficient GPU processor architecture. In recent years, because of their massive parallelism, good energy efficiency and high memory bandwidth, GPUs have been deployed into supercomputers, data centers and various research platforms to accelerate scientific computing. High power consumption of large computer systems has motivated much research. The front-end units for instruction processing take a significant portion of GPU energy. We propose to split the streaming multiprocessors(SMs) and group several adjacent SMs in synchronous execution to share the front-end units for energy savings. Mechanisms for grouping, ungrouping and regrouping SMs are designed to ensure the correct and efficient execution of Buddy SM architecture. In addition, the hardware implementation details and the methods for performance improvement are also introduced. Experiments show that Buddy SM architecture can reduce a significant portion of the front-end energy and thus improves the energy efficiency of the entire GPU processor.The second part of the work deals with system software for GPU HPC. As the dominant massively parallel accelerators, GPUs have been used to accelerate numerous throughput-oriented regular applications and achieve significant speedups. However, algorithms from various domains such as data mining, physics simulation and optimization theory build, traverse, and update irregular data structures such as trees,graphs, and priority queues. These algorithms generally exhibit unpredictable, inputdependent behaviors and performance degradations when running on GPUs because of the mismatches between GPUs’ SIMT architectures and the irregular algorithmic behaviors. Experiment study shows that load imbalance at thread-level is a major source causing such performance degradation. Therefore we propose a task pool approach to balance the load at thread-level on GPUs. The task pool approach is the first method that can balance the load at thread-level on GPUs. We further implement an open-source library, which can characterize and balance irregular workloads on GPUs. Finally, several real-world applications are characterized and optimized using the library and achieve significant speedups.The third part of the work explores the applications of GPUs in Big Data processing. Big Data is one of the hottest research topics today. A Gartner report predicted that the data worldwide will grow by eight times in the next five years, and 80% of the data will be unstructured. Unstructured data like the data of social networks are mostly stored in the graph format, so efficient processing of graph data is the demanding request of many domains including data analytics, network search and recommend systems. We design and implement a non-distributed, general graph processing platform for efficient graph processing on hybrid CPU and GPU systems. Owing to the effective graph partition, storage and prefetch mechanisms, the platform can work on a heterogeneous PC with CPU and GPU processors and processes large-scale graphs exceeding the capacity of PC memory. We compare the platform with other graph processing platforms on performance and energy efficiency.The application of the software and hardware technologies in this dissertation can improve the performance and energy efficiency of current GPU HPC and extend its applicable areas.
Keywords/Search Tags:High-performance Computing, Energy Efficiency, GPU, Processor Architectures, Irregular, Load Balancing, Characterization, Graph Processing, Graph Algorithms
PDF Full Text Request
Related items