| In modern processor design,the design of the cache(Cache)directly affects the performance of the processor,and the optimization of the Cache has always been an important research issue.This paper chooses the instruction Cache of the RISC-Ⅴ open source processor "Xuantie" C910 for research,and optimizes the design of the prefetching mechanism with low prefetching accuracy of the C910 instruction Cache and the replacement algorithm that cannot adapt to multiple access modes.By decoupling the front-end prefetch technology to improve the accuracy of instruction cache prefetch instructions,a dynamic replacement algorithm is introduced to replace the data blocks of the instruction cache.The main work of the paper is as follows:For the C910 instruction cache,a decoupled front-end FDIP instruction prefetcher design is proposed.By separating the branch predictor from the instruction fetch module,the branch predictor runs ahead of the instruction fetch module,and the branch prediction results are used to guide instructions Cache performs prefetching operations.The thesis completes the hardware implementation of the decoupling front-end,the prefetch detection pointer,the instruction data cache module,the control module of the FDIP instruction prefetcher,the prefetch cache area and the prefetch flushing mechanism,and finally verifies the correctness of the design through the simulation test platform.Improve the simple FIFO replacement algorithm of C910 instruction cache,and propose a dynamic replacement algorithm with more usage scenarios and higher hit rate.The thesis narrates the principle of the dynamic replacement algorithm,realizes the control module of the dynamic replacement algorithm,the detection pointer,and the historical information recording module of the replacement algorithm.By calculating the historical information and current information of the execution program,the dynamic algorithm scheduling is carried out.When the program correlation is good,the LRU replacement algorithm is selected.When the program correlation is poor,the 2Q algorithm is selected.Finally,through the simulation test,and select a specific test file to check for multiple Adaptability to different access modes.The paper conducts simulation tests through the SMART platform,runs Coremark and Dhrystone and other test programs,and intercepts the performance data of the processor through the performance monitoring unit for analysis.The performance of the decoupled front-end FDIP instruction prefetcher is improved by 4%,and the prefetch efficiency is 85%.The performance of the dynamic replacement algorithm is improved by 2%,and the combination of the two optimization schemes increases the performance of the C910 by 5.4%. |