Font Size: a A A

Extended Programmable Neural Network Acceleration System Of Cortex-M3

Posted on:2020-06-10Degree:MasterType:Thesis
Country:ChinaCandidate:K G YangFull Text:PDF
GTID:2428330602450401Subject:Microelectronics and Solid State Electronics
Abstract/Summary:PDF Full Text Request
With the improvement of technology,computing capability has increased with remarkable rapidity.Artificial intelligence based on neural network has been widely used in image processing,control system,pattern recognition,financial management and other fields.Due to the dependence of deep learning on computing capability,most neural networks are now based on CPU or GPU for training and inference.However,with the change of network structure and the improvement of real-time computing requirements,the traditional implementation methods will be difficult to satisfy the application in the future.Therefore,many special acceleration circuits have appeared in recent years.So,ASIC is necessary for deep learning.According to the application scenarios of acceleration circuit,it can be divided into two parts,the server cloud and the edge computing.The lightweight acceleration circuit for embedded terminal is one of the trends of development.Therefore,this paper uses the Cortex-M3 processor IP provided by the Designstart plan of ARM company,and designs an ARM system-on-chip with integrated a programmable neural network acceleration unit for embedded terminals.The main work of this paper is as follows.?1?The development and current situation of neural networks are introduced,and the advantages and disadvantages of software and hardware implementation of neural networks are compared.The inference and backward propagation of BP neural network algorithm are described emphatically,and the acceleration circuit of BP neural network is designed according to the process of inference.?2?By comparing and analyzing the advantages and disadvantages of various fitting methods of neural network activation function,it is determined that 6-segment linear functions with a slope of 2-n are used to fit activation function,so that division operation can be replaced by shift operation,which reduces the computational complexity when the accuracy loss of handwritten digit recognition is less than 0.2%.?3?According to the characteristics of the neural network parameters in the process of inference,the distributed buffer and dynamic ping-pong buffer are designed to optimize the storage system structure of the acceleration system.The requirement of write bandwidth for external interface is reduced by 93.75%and the parallel operations of network acceleration and data reading and writing is supported.?4?The system integration of neural network accelerator and Cortex-M3 processor IP is realized by using system-on-chip design method,which enables the neural network accelerator to adapt feedforward inference acceleration with different topologies through processor software programming.?5?Based on the acceleration system designed in this paper,a handwritten digit recognition neural network is implemented on the FPGA platform.Compared with the simulation test results of matlab and C,the power consumption of the acceleration system designed in this paper is 2.8 W,and it is nearly 10 times faster than that of the CPU.The extended programmable neural network acceleration system of Cortex-M3 designed in this paper combines the processor and acceleration peripherals to make the whole system have low power consumption,low bandwidth requirements,high parallelism,and configurable network structure.It basically meets the design goal of real-time inference for neural networks in the embedded application scenario.
Keywords/Search Tags:neural network, SoC, ARM, FPGA
PDF Full Text Request
Related items