Key Research On Microarchitecture Of High Efficient GPU

Posted on:2019-04-30

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Y Zhang

Full Text:PDF

GTID:1368330623950480

Subject:Electronic Science and Technology

Abstract/Summary:

PDF Full Text Request

Having multi-core processors and many-core processors with high compute power and high efficiency is the key to improve the performance of high performance computer system.This paper explores a new architecture of many-core processor to meet the needs of high performance and high energy efficiency,so that it can be applied to the next generation of super computer system with E level computing.We carry out in-depth researches on the architecture of GPU,which is a commonly used many-core processors.The research achievements mainly include the following aspects.1.A thread scheduling method based on locality protection and latency hiding is proposed in this paper.On the basis of the current scheduler,a scheduler with better performance is designed and implemented in the paper,which can better maintain data locality and hide long memory latency.This method can improve the performance of an average 2.2% over the baselined method across different benchmarks and the total hardware overhead can be ignored.2.In this chapter,the traditional LRU replacement strategy used in cache is improved and the design is designed.A local information collector based on PC information.In addition,a cache allocation unit,which is coordinated with the improved LRU unit,is designed to better allocate the priority in the cache block and optimize the expulsion strategy.Through this in the GPU simulator The optimization method is used to evaluate the performance improvement of the average 5.0% over the base method at low hardware overhead.3.This paper proposes a collaborative cache management and thread bundle scheduling method.The method of scheduling and thread bundle scheduling uses the local information collected by cache to simultaneously guide the process of cache management and thread bundle scheduling.Based on the information collected by locator,two thread scheduling methods are proposed.One is thread bundle rearrangement(CWLP)based on reusing information,and the other is CTLP based on reusing information.Compared with the benchmark scheduling method,CWLP can get the average 8.8% performance improvement compared with the benchmark scheduling method and the performance improvement compared with the latest scheduling average 4.8%.CTLP party The method improves the average performance of the cache friendly program up to 24.5%,and the average performance increases with13.6%.In summary,the high performance parallel computing based on GPU processor microarchitecture is studied,optimized for GPU scheduling method and storage system management deficiencies,improve the computation efficiency of GPU,which is the next generation of high performance The foundation for the design of the many nuclear processors is laid.

Keywords/Search Tags:

High performance coprocessor, multithreading, GPU, cache, thread scheduler, cache management strategy, long delay hiding

PDF Full Text Request

Related items

1	Adaptive Cache Management Policies For High Performance Microprocessors
2	Optimization Techniques Of Cache In Chip MultiThreading
3	Application Research Of Data Cache Technology In MIS
4	The Design And Simulation Of High Efficient Cache Protocol In Multi-processors System
5	Design And Implementation Of Distributed Cache Management System For In-memory Columnar Database
6	Research On High Performance And Low Power Design Of Embedded Memory Management Unit
7	Research On Utility Based Probabilistic Routing Algorithm And Cache Management Strategy In Delay-tolerant Networks
8	A Design And Implementation Of Inter-Thread Cache Interference Elimination Structure Based On Cache Partitioning
9	Research On Performance Analysis And Joint Optimization Of Cache And Wireless Transmission
10	The Research Of Shared CMP Cache Management