| In recent years,with the rapid development of neural network related theories and algorithms,convolutional neural networks have shown great advantages in the field of machine vision due to their powerful two-dimensional information processing capabilities and good generalization capabilities.However,the computational complexity of convolutional neural network algorithms is large.Algorithms are deployed on embedded systems with limited power consumption and cost,and processors cannot provide sufficient computational power.In order to improve the efficiency of executing inference tasks in embedded systems,this paper designs a convolutional neural network processor based on RISC-V ISA.This processor meets the computational power requirements of forward reasoning tasks,and has the advantage of low power consumption and low cost.The main work of this article is as follows:(1)Research and design a dynamic out-of-order pipeline RISC-V processor core.The processor core Sparrow RV-EX is compatible with the RV32 IMABC instruction set,and adopts a 5-7 level dynamic scheduling pipeline design.It supports out-of-order issue and out-of-order writeback,and uses a scoreboard to solve pipeline adventure problems.The branch prediction unit is equipped with a two bit saturation counter and a return address stack,which can predict the jump direction and jump address.The dynamic iterative divider improves the execution speed of division instructions.(2)Research and design a dedicated hardware accelerator for convolutional neural networks.Row parallel scheme is used on convolutional layer.BOOTH algorithm and Wallace tree are used to reduce the power and area of the multiply add computing unit.Dedicated convolutional storage controller is used to improve data throughput.Connect the pooling and activation module to the end of the convolutional layer pipeline to reduce the overhead of repeated data access.(3)Construct a convolutional neural network processor system.The processor core interacts with the accelerator through a custom instruction interface.The internal bus enables hierarchical interconnection of the processor core,accelerator,and various peripheral modules.The debugging mechanism provides a more efficient software and hardware defect analysis tool.The software layer encapsulates the underlying hardware operations through board level support packages,improving the development efficiency of the upper layer applications.(4)Through experiments,analyze the functional correctness and energy efficiency ratio of the accelerator,and evaluate the area and cost of the system.FPGA prototypes platform is used to verify the correctness of analysis functions and system performance,the accelerator has a computational power of 92.2GOPS@100MHz.When logic synthesis is performed at a 130 nm process node,the dynamic power consumption of the accelerator system is 142.9 m W,and the cell area is 701685.7 square micrometers.The results show that the energy efficiency of the accelerator is good,the cell area belongs to the microprocessor level,and the cost and power consumption performance meet the requirements of embedded systems. |