With the rapid development of deep convolutional neural network(DCNN)algorithms,it has been widely used in image recognition,medical diagnosis and other fields.Deep convolutional neural networks are usually implemented on CPU,GPU,ASIC,FPGA,etc.Existing DCNN processors are mainly designed for high-end applications such as autonomous vehicle,data center and smart phone where the design focus is the performance,while for IoT applications,power and cost are more important.In addition,programmability is especially important for DCNN processors to support different deep convolutional neural network algorithms.This thesis presents a cost-and power-efficient programmable DCNN processor dedicated for IoT applications.Firstly,this thesis analyzes the current research of DCNN processors at home and board,and introduces the latest achievements in implementing DCNN processors on different platforms.According to the application requirements of the processors,the costand power-efficient programmable DCNN processor is proposed.Secondly,this thesis introduces the basic concepts,hardware implementation,parallel analysis of the deep convolutional neural network algorithm.From the theoretical level to complete the convolution layer,pooling layer,activation function and fully connected layer hardware implementation of the feasibility analysis.At the same time,the parallel computing design of the deep convolution neural network processor is analyzed from three parallel dimensions:convolution kernel parallelism,input channel parallelism and output channel parallelism,and three parameters for measuring the performance of the processor are proposed.Thirdly,in the hardware design part of programmable deep neural network processor,this thesis adopts five optimization innovation techniques.In the low-power design part of the processor,this thesis adopts the "cluster"-based "S" type read strategy and data multiplexing technology to maximize the reuse of data,reduces the number of memory reads and reduce power consumption.The intermediate feature map accumulation operation is completed based on the “feature map” accumulation method,which reduces the repeated loading times of the input feature image to reduce the power consumption.The near zero value filtering is combined with the zero value skipping technique to shield the transmission and calculation of the zero value data to reduce the power consumption.In the cost-efficient and programmable design part of the processor,this thesis adopts the programmable layer processing computing architecture,completes all layers of DCNN through the multiplexing layer processing computing architecture,reduces hardware resources and reduces design cost,and its programmability is also increase the flexibility of the processor.And the processor adopts a "row" type data storage structure to speed up data reading speed,strike a balance between data reading and calculation,and improve the overall speed of the processor.Finally,simulation results are given based on the Vivado 2017.1 kit as a development and simulation tool.The Xilinx virtex-7 FPGA VC707 evaluation suite was used for hardware verification.The accuracy and performance of programmable DCNN processor based on FPGA are analyzed.The results of 31.01GOPS/W and 0.22 GOPS/DSP are obtained,which are superior to several existing DCNN processors.At the same time,this thesis summarizes the programmable DCNN processor proposed in this thesis,and puts forward Suggestions for later optimization. |