The demand for data processing by AI(Artificial Intelligence)is rising.General graphics processing units cannot exert optimal performance and power consumption for specific tasks,while some specific chips,especially accelerators for specific application scenarios,can provide parallelism and customized operation process,to achieve the best energy efficiency,has become the mainstream solution for current mobile terminal products.On the other hand,due to the increasing design scale and the improvement of manufacturing process nodes,the functions of AI accelerator chips are becoming more and more complex,and the possibility of design errors is greatly increased.The accuracy,completeness and stability of chip verification have become the focus of the research and development of chips.The hardware implementation of CNN(Convolutional Neural Network)mostly generates register transfer level circuits through high-level synthesis language.However,this method has the limitation on the support for synthesizable RTL,and also brings great challenges in product iteration and engineering reusability.Based on the hardware description language System Verilog,the CNN accelerator with the forward operation function is investigated,designed and implemented.Both the UVM simulation and FPGA board-level verification are provided to ensure the correctness of the function and facilitate the development and application of the CNN,and rapid deployment to embedded platforms.Combining the advantages of System Verilog and UVM verification methodology,this paper analyzes the platform structure,core mechanism and register model of the general verification methodology,and introduces the interface protocol to provide reference for formulating the excitation sequence of verification test cases.This paper investigates and designs a CNN accelerator in the terminal application of the Io T(Internet of Things),and applies this accelerator as the verification object to set up the simulation verification platform and the prototype verification platform.This papaer also introduces the characteristics of CNN,analyzes its forward propagation workflow,and then studies and analyzes the functions of the accelerator module data flow,convolution calculation,etc.,the parameter adaptation of the convolution layer of the commonly used algorithm model,and function configuration of each module register.According to the UVM verification methodology and design specifications,the general verification components of the platform are constructed,the communication between components is realized,the register model corresponding to the register of the module to be tested is generated,and the verification platform of the hardware accelerator module is builded.The development of test cases,the read and write access operations of the register model,the realization of the reference model and the automatic comparison of the results of the scoreboard are focused in the analysis.Through a large number of test cases,the accelerator module is fully simulated and verified and the coverage is collected,and the code of the simulation verification platform is optimized,thereby shortening the simulation time and improving the reusability of the verification platform.Moreover,the board-level test environment is set up.Based on the excitation generated by the test cases of the simulation verification platform,the debugger is used to convert the data into the JTAG protocol,and then pour all the data into the accelerator module to run,which finally verify the correctness of the simulation platform.When the multi-channel data of the accelerator is read,a fixed-priority polling arbiter is applied to solve the fairness problem of the arbitration and save the resources and power consumption of the FPGA.The simulation verification and board-level test results show that the design of the proposed hardware accelerator is reasonable and the build verification platform is efficient and correct.All function point tests are completed,and the code coverage rate reaches100%.The calculation results of all test cases are compared which show correctness and meet the design expectations.Design codes are synthesized to gate-level circuits,which are optimized to generate a netlist file based on the process library. |