Font Size: a A A

Cortex-R8-based CPU Subsystem Function Verification And Performance Optimization

Posted on:2021-11-24Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhangFull Text:PDF
GTID:2518306050454234Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the advent of the era of artificial intelligence,mobile electronic devices have put forward new requirements for the data processing capabilities of chips.Under the guidance of the pursuit of high-speed,high-bandwidth,and high-performance market demands,multicore technology has gradually been reused from high-performance computer systems into the chip of mobile electronic devices.The increase in the number of CPUs on the chip also means that the performance of each CPU may become a short board that affects the overall chip performance.At present,the CPU performance analysis in the industry is mainly performed after tapeout,that is,through software-level testing to obtain the performance data of the entire system,if it is found that the performance is unqualified at this moment,it will be difficult to locate the key points that restrict the performance bottlenecks in time.Therefore,how to give full play to the performance of each CPU and avoid the performance shortcomings of a single CPU has increasingly become a challenge in the chip design process.In order to ensure product quality,major chip developers have gradually started to perform CPU checks about performance analysis and optimization before tapeout.Therefore,for the verification personnel,not only need to be familiar with the function of the module under test,but also various factors that may affect the performance of the module,in order to perform performance analysis of the chip before tapeout,and timely find and make up for shortcomings in performance.Based on the research on the ARM Cortex-R8 multi-core high-performance real-time processor commonly used in 5G baseband chips,this paper completes the functional completeness verification of the Cortex-R8 processor and the second-level cache.The performance of the system is analyzed,and a feasible optimization scheme is finally given.First,this paper analyzes the architecture of the Cortex-R8 processor and the CPU subsystem,introduces the system's functions and performance,and makes in-depth research on the storage system that affects the data transmission performance of the CPU subsystem.Then,based on the understanding of the functions and performance of the CPU subsystem,a UVMbased So C verification platform was built,and the armcc compiler for C and assembly language joint verification was integrated in the verification environment.Subsequently,the C-joint assembly language was used to create a directional testcase and a random test stimulus from an external verification IP was used to complete the verification of all test function points of the Cortex-R8 processor and L2 cache.In order to ensure the normal data transmission of the CPU subsystem,this article also adds testcases for the data interaction scenarios of each module in the storage system.Before carrying out the performance analysis work,this paper extracted the performance parameters of the CPU subsystem,and built an automatic performance monitoring platform based on PMU based on the extracted parameters.The platform enables the performance monitoring module PMU in the ARMv7 architecture,references OCP VIP as a simulated external host for the CPU subsystem,and uses Python and TCL to automate performance data processing and analysis.Finally,this paper enables the monitoring platform to monitor the actual application of the CPU subsystem in typical scenarios and complex scenarios,and analyzes and optimizes based on the monitoring results.In this paper,the functional completeness verification of the Cortex-R8 processor and the second-level cache was finally achieved,and the code coverage and functional coverage reached 100% through regression testing.Subsequently,this article uses the built-in automatic performance monitoring platform to monitor the data transmission behavior of the CPU subsystem.Taking the secondary cache as an example,it analyzes the performance parameters and proposes an optimization scheme.Finally,the average data transmission delay of the CPU subsystem is finally achieved,it is shortened by 35.97%,and the maximum throughput is increased by 39.34%.The optimized L2 cache and the entire CPU subsystem have been delivered and tapeout successfully.In addition,the PMU-based processor performance monitoring platform built in this paper has also been reused in other similar projects.
Keywords/Search Tags:Cortex-R8, CPU subsystem, L2 cache, PMU, Verification
PDF Full Text Request
Related items