| Nowadays,chip multiprocessors(CMPs)have become the research focus of industry and academia.In order to relieves the “memory wall” problem,CMPs use multi-level cache hierarchy to hide the performance gap between processors and main memory.The last level cache(LLC),located at the bottom level,is the last barrier to avoid off-chip memory accesses.Its miss penalty is very large and has a critical impact on performance.For the sake of easy design and implementation,core expansion and economic considerations,tiled chip multiprocessors has become the main design option.Therefore,LLC optimization technology in tiled CMPs has become a hot topic in the field of computer architecture.As the core count scales and the application working set growns larger,LLC in tiled CMPs faces the challenges of too long on-chip access latency and too much storage space in terms of performance and storage space.In order to meet the above challenges,this paper takes the LLC performance and storage space optimization in tiled CMPs as the research goal,and proposes a series of LLC optimization strategies applicable to different scenarios.The main work and contributions of this paper include:Firstly,a perceptron-based LLC data replication policy is proposed for cache intensive applications.The LLC data replication policies using the conventional optimization methods only use a single feature to predict the reusability of L1 victims and guide the replica selection process,resulting in low accuracy of the victim reuse prediction and replica selection and limiting performance improvement,which is not suitable for cache intensive applications with high reusability.To solve this problem,a perceptron-based LLC data replication policy PBR is proposed.PBR uses perceptron to effectively combine the four features(address,program counter,data type and access count)related to reuse prediction to predict the reusability,so as to guide the replica selection process.So,PBR can adapt to the changes of cache block reusability during program execution or between different programs,so as to improve the accuracy of reuse prediction and replica selection,and improve performance.Experimental results show that this policy can improve performance,reduce network traffic and control the storage overhead within a reasonable range,and is especially suitable for improving the performance of cache intensive applications.Secondly,a two-level cache aware LLC adaptive data replication policy is proposed for non cache intensive applications.Because PBR uses a multi-feature based reuse prediction method and introduces more logic overhead,it will offset some local LLC hit gains,and is not applicable to non cache intensive applications.The existing single-level data replication mechanism will lead to low accuracy of replica selection or increase LLC pressure,limiting performance improvement.To solve this problem,a two-level cache aware LLC adaptive data replication policy TCDR is proposed.This strategy senses two cache levels at the same time to optimize data replication,predicts the L1 victim reuse behaviors and monitors the LLC replica reception capability,so as to select replicas with high reuse locality and/or short reuse distance and insert them into the appropriate MRU or LRU position in LLC.TCDR can not only improve the accuracy of L1 replica selection,but also avoid increasing the LLC pressure.Experimental results show that this policy can effectively improve performance and control the storage overhead within a reasonable range,and is especially suitable for improving the performance of non cache intensive applications.Thirdly,a reuse-degree based hardware-efficient locality classifier structure design method is proposed for saving hardware resource consumption,.The exist LLC locality aware data replication policy only senses the locality of cache blocks and ignores how many cores reuse them,resulting in limited performance improvement and waste of storage space.To solve this problem,a new concept of “reuse degree(RD)” is defined to represent how many cores cache blocks are reused.Then,through experimental observation,it is found that the RD value of most cache blocks in LLC is 0 or 1 and it is unnecessary to expand the locality information storage space of three cores,while only a small number of cache blocks have RD values greater than1 and it needs to store complete locality information.Then,based on the experimental observation,a reuse-degree based hardware-efficient locality classifier structure design method RDHD is proposed.The policy adopts a reuse-degree based locality classifier RD_LC,which senses the reuse degree and locality of cache blocks to control the replica selection.The locality information array of RD_LC is decoupled from the tag array,and is divided into single and complete locality information arrays.The information is saved in the corresponding locality information array according to the RD value of the cache block.Experimental results show that this policy can not only reduce storage overhead,but also further improve performance and reduce network traffic.In addition,RDHD is applicable to application scenarios with low hardware overhead and high performance requirements.Finally,an inclusive LLC selective allocation policy for storage space reduction is proposed.The inclusive LLC selective allocation policies using the conventional optimization methods only use a single feature to predict the reusability of LLC cache blocks and guide the data array selective allocation process.resulting in the insufficient accuracy of reuse prediction and limiting LLC space reduction.To solve this problem,an inclusive LLC selective allocation policy for storage space reduction SASR is proposed.This strategy uses perceptron to effectively combine the three features(address,program counter and reuse locality)related to reuse prediction to predict the reusability,and selectively allocates entries only for the reused cache blocks in the data array,so as to improve the accuracy of reuse prediction and the efficiency of LLC space reduction.The experimental results show that this policy can minimize the space of inclusive LLC while ensuring the performance,and the additional storage overhead is negligible.To sum up,this paper studies the optimization of LLC performance and storage space for tiled CMPs,proposes a variety of LLC optimization technologies,and applies the machine learning(ML)optimization method to the two problems of data replication and selective allocation,which can effectively improve performance and save storage space. |