Font Size: a A A

Key Research On Microarchitecture Of High Efficient GPU

Posted on:2019-04-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:1368330623950480Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
Having multi-core processors and many-core processors with high compute power and high efficiency is the key to improve the performance of high performance computer system.This paper explores a new architecture of many-core processor to meet the needs of high performance and high energy efficiency,so that it can be applied to the next generation of super computer system with E level computing.We carry out in-depth researches on the architecture of GPU,which is a commonly used many-core processors.The research achievements mainly include the following aspects.1.A thread scheduling method based on locality protection and latency hiding is proposed in this paper.On the basis of the current scheduler,a scheduler with better performance is designed and implemented in the paper,which can better maintain data locality and hide long memory latency.This method can improve the performance of an average 2.2% over the baselined method across different benchmarks and the total hardware overhead can be ignored.2.In this chapter,the traditional LRU replacement strategy used in cache is improved and the design is designed.A local information collector based on PC information.In addition,a cache allocation unit,which is coordinated with the improved LRU unit,is designed to better allocate the priority in the cache block and optimize the expulsion strategy.Through this in the GPU simulator The optimization method is used to evaluate the performance improvement of the average 5.0% over the base method at low hardware overhead.3.This paper proposes a collaborative cache management and thread bundle scheduling method.The method of scheduling and thread bundle scheduling uses the local information collected by cache to simultaneously guide the process of cache management and thread bundle scheduling.Based on the information collected by locator,two thread scheduling methods are proposed.One is thread bundle rearrangement(CWLP)based on reusing information,and the other is CTLP based on reusing information.Compared with the benchmark scheduling method,CWLP can get the average 8.8% performance improvement compared with the benchmark scheduling method and the performance improvement compared with the latest scheduling average 4.8%.CTLP party The method improves the average performance of the cache friendly program up to 24.5%,and the average performance increases with13.6%.In summary,the high performance parallel computing based on GPU processor microarchitecture is studied,optimized for GPU scheduling method and storage system management deficiencies,improve the computation efficiency of GPU,which is the next generation of high performance The foundation for the design of the many nuclear processors is laid.
Keywords/Search Tags:High performance coprocessor, multithreading, GPU, cache, thread scheduler, cache management strategy, long delay hiding
PDF Full Text Request
Related items