Font Size: a A A

Research On Shared Cache Access Fairness For Many-Core Processor

Posted on:2020-10-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z C WangFull Text:PDF
GTID:1488306548991329Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
Along with the scaling up for the size of systems-on-chip and the increase in the number of cores,the system has a higher demand for on-chip cache in terms of capacity and speed.In order to effectively utilize cache resources,non-uniform cache architecture(NUCA)is proposed to support cache organization with high-capacity and low-latency.On the other hand,networks-on-chip(No C)has significant advantages in terms of the interconnection of chip many-core processor due to its good scalability.Therefore,No Cbased NUCA is gradually becoming the major architecture to organize large cache.In such system architecture,last level cache(LLC)is distributed on every network node,and all the cache banks logically constitute a unified shared cache.When a core issues a cache access request,the access time is determined by the network distance between the core and the requested cache bank.When the cache bank is near the core,the access time is short;when accessing a cache bank with a long distance,the access time is longer.Thus,when the scale of system is gradually increased,the communication distance and latency gap between different cores is also increased due to the feature of the access latency associated with the network distance.In addition,the increase in the size of No C will also make the cache access latency gradually dominated by the network latency.Such latency gap can cause the network latency imbalance problem,aggravate the degree of non-uniform for cache access latencies,and lead to more cache accesses with overhigh latencies which become the bottleneck of system.Hence,the research on shared cache access fairness in many-core processor has a positive meaning for the promotion of network and system performance.This thesis aims at the fairness of shared cache access,by proposing three methods respectively about the No C router microarchitecture,link distribution in No C,and memory maaping in many-core processor,to optimize the cache access fairness.The main contributions and innovations of this thesis are shown as follows:(1)A strategy for switch allocation of No C routerThe heart of the router datapath is the crossbar switch,which plays a critical role in scheduling packets and determining network latencies.In this paper,we aim to alleviate unbalanced network latency problem which is caused with large scale No C and propose fairness-oriented switch allocation(FOSA)strategy.The canonical separable switch allocator cannot be aware of the congestion of each port.Therefore,it is impossible to know which ports are more likely to cause congestion and cause large delay network packets.Compared with the canonical separable switch allocator and the recently proposed switch allocator TS-Router,the experiments show that our approach decreases latency standard deviation by 13.8% and 3.9%,respectively,as well as maximum latency by45.6% and 15.1%,respectively.The results indicate that FOSA can not only effectively improve the network latency balance,but also reduce the impact of large delay network packets on the overall performance of the system.(2)A load-balanced strategy for link distribution of No CAlong with the scaling up for a mesh-based network,the inequivalence of location for the links gradually causes unbalanced traffic load on each link.Different from the traditional uniform interconnection between network nodes,we propose the load-balanced link distribution scheme,which aims at assigning physical channels in accordance with the traffic load of each link.In this paper,we analyze the traffic load distribution for the mesh network with different scales and give the corresponding load-balanced link distributions.The experimental results show that the load-balanced link distribution strategy can effectively balance the distribution of network traffic on the link with a small number of physical channels,and the optimization degree will become more apparent as the size of the network grows.The experiments with PARSEC benchmarks reveal that the loadbalanced link distribution strategy has a maximum decrease of 6.97% in average network latency,and an average decrease of 4.22%.In terms of system performance,it has increased by an average of 2.1% on IPC.(3)An non-uniform memory mapping strategy for optimizing the fairness of shared cache accessIn chip many-core processors,large delayed cache access tends to be performance bottlenecks for system memory access.As a result,shared cache access fairness has a very important impact on system performance.The memory-to-LLC mapping scheme in a many-core processor actually affects the average access cost of each cache bank,whereas the traditional static NUCA(S-NUCA)architecture generally performs a simple uniform memory-to-LLC mapping scheme.This paper proposes a non-uniform memoryto-LLC mapping scheme,which aims to balance the cache access latency by affecting the average access cost of each cache bank.Compared with the traditional S-NUCA,our design F-NUCA achieves the goal of optimizing cache access fairness under different network scales.The average reduction on the LSD was 0.7%/7.7%/19.6%,and the average reduction on ML was 2.9%/11.6%/12.8%.In terms of system performance,the experimental results on the PARSEC benchmarks show that the F-NUCA has increased by a maximum of 2.1%/3.9%/14.0% on the 16/32/64-core system,with an average increase of 1.1%/2.1%/6.7%.
Keywords/Search Tags:Many-Core Processor, Non-Uniform Cache Access, Cache Access Fairness, Networks-on-Chip, Switch Allocation, Link Distribution, Load-balanced, Memory Mapping
PDF Full Text Request
Related items