Design And Optimization Of Shifted Convolutional Neural Network Based On FPGA Platform

Posted on:2022-05-29

Degree:Master

Type:Thesis

Country:China

Candidate:P Guo

Full Text:PDF

GTID:2518306764995959

Subject:Automation Technology

Abstract/Summary:

PDF Full Text Request

As one of the important technology of computer vision,Convolutional Neural Networks(CNN)has been widely used in many fields.But,the increasing depth of CNN requires a lot of computing resources and memory resources.At present,GPU with high computing power,high parallelism and high power consumption is often used as the computing platform.However,with the construction of the Internet of Things ecology,it is obviously difficult for GPU to meet the needs of low power consumption of terminal nodes.In recent years,research on hardware acceleration technology based on ASIC,FPGA and other low-power platforms has been continuously promoted.FPGA,as a high-parallelism,low-cost and reprogrammable low-power chip,has great advantages in network acceleration.In order to meet the requirements of deploying convolutional neural network in the terminal node,based on the idea of software and hardware co-design,this paper designed a special network acceleration platform that can independently infer from the terminal node by analyzing the computing characteristics of CNN and FPGA platform.In order to improve the calculation efficiency of FPGA platform,the weights and activation values are quantified at fixed points.In terms of hardware,in order to reduce the occupation of cache,the quantitative network proposed in this paper is customized based on FPGA platform.Specific plans are as follows:1.By shifting quantization,the multiplication and summation operation in convolutional neural network is replaced by shifting and summation operation,which improves the computational speed and reduces the computational cost while maintaining the precision.In order to avoid the loss of precision,this paper adopts the strategy of grouping quantization and step quantization.In grouping quantization,weights are grouped according to thresholds,and floating-point weights are used to assist quantization.In step quantization,weight quantization and activation value quantization are separated to avoid the local minimum value which is not optimal in the training process.2.In the hardware acceleration design,this paper is divided into two parts: data path design and special module design.In the data path part,in order to reduce the occupation of cache,four parts are mainly designed,which are block convolution design,data cache design,data pre-reading design and data transmission design.In the part of special module design and optimization,the main network layers of CNN,convolutional layer,pooling layer,full connection layer,are customized module design,and the integration of each module.The acceleration scheme designed in this paper adopts the layered calculation mode,and has been tested and verified on the CIFAR-10 data set.The results show that the speed of image classification can reach 132.23 FPS at a clock frequency of 100 MHZ.Compared with the original network running on ARM CPU,the accelerator achieves95.85-fold acceleration,the power consumption is 2.034 W,and the DSP utilization rate is only 4.09%,which greatly saves the on-chip computing resources.

Keywords/Search Tags:

Shift CNN, Hardware acceleration, FPGA, Software and hardware co-design

PDF Full Text Request

Related items

1	Research On Hardware Acceleration Algorithm For Target Recognition
2	The Design And Implementation Of Object Detection Chip Based On Deep Learning Algorithms
3	Acceleration System Design And Implement For Convolutional Neural Network Based On SOC FPGA
4	Software And Hardware Acceleration Design Of Shift Convolutional Neural Networks
5	Research On Software And Hardware Acceleration Of Acc-YOLOv4 Object Detection Algorithm
6	Design And Implementation Of Hardware Acceleration Architecture Of Physical Layer Protocol Stack Based On FPGA
7	Hardware And Software Co-design And Implementation Of Face Recognition System Based On Hardware Acceleration Algorithm
8	High Performance Artificial Intelligence Computing With Algorithm-hardware Co-design
9	Design Of Hardware Acceleration IP Core For RC4 Encryption Algorithm
10	System-on-a-Chip (SoC) based hardware acceleration in Register Transfer Level (RTL) design