Implementation And Verification Of Caffe Deep Learning Architecture Based On FPGA

Posted on:2021-01-22

Degree:Master

Type:Thesis

Country:China

Candidate:Y L Min

Full Text:PDF

GTID:2518306050954129

Subject:Master of Engineering

Abstract/Summary:

PDF Full Text Request

As an important branch of modern artificial intelligence,Deep Learning(DL)algorithm has been widely used in the realization of many projects such as pattern recognition,natural language processing,and machine vision.Convolutional Neural Network(CNN)is a deep learning algorithm based on the principle of biological brain visual cortex,known for its high classification and recognition accuracy.Caffe is the first industrial-level deep learning architecture.Field programmable gate array(Field Programmable Gate Array,FPGA)shows good advantages in the effective calculation of convolutional neural networks,because of the large number of parallel operations in convolutional neural networks.At present,there are a lot of development work on FPGAs for convolutional neural networks.This is because FPGAs have low power consumption,customizable,programmable structures,and their excellent performance in parallel computing.However,most of the current deep learning architectures do not have a general basic configuration other than CPU(Central Processing Unit)and GPU(Graphics Processing Unit)computing devices,which makes the difficulty of deep learning on FPGA greatly increased,and designers must New design and implementation of each model,testing the correctness of the network and performance optimization,can not simply use existing work.Convolutional neural networks are computationally intensive algorithms,which are particularly reflected in the large number of multiply-add operations that exist in convolutional layers.Multiply-add operations are an important factor that affects the overall efficiency of the algorithm,which prompts researchers to reduce the necessary operations in the convolutional layer the amount.At present,many research results have significantly improved the GPU implementation performance of convolutional neural networks,which is reflected in the reduction of the time for classification and training.Through these improvements,many deep learning frameworks can be used to achieve accelerated convolution on the CPU and GPU,but there are few accelerated convolution implementations for FPGAs.In this context,this article first analyzes the parallelism in the convolutional neural network and the parallelism in Open CL,mainly including the parallelism of the convolution operation itself,the parallelism of the filter,and the optimization strategy of the calculation unit replication in Open CL,Data parallel optimization strategy,task parallel optimization strategy,etc.Secondly,this paper analyzes the FPGA implementation of the Winogard convolution algorithm,and describes the optimization of the Winograd convolution,theoretically verifying that the algorithm can effectively reduce the consumption of various resources on the FPGA chip.Thirdly,the design of this paper uses FPGA as the acceleration device to implement the operation acceleration,and uses the CPU as the host to implement the control,uses the PCIe interface to realize the communication between the host and the FPGA,and implements the convolutional neural network on the heterogeneous platform.Finally,this paper designs a modified version of the deep learning framework Caffe,which has FPGA support,so that you can use FPGA to implement a convolutional neural network model written based on Caffe,and you can flexibly recreate FPGA devices when necessary Programming,to achieve seamless memory transaction processing between the host and the device,build an easy-touse test platform,create a pipeline layer to achieve inter-layer communication,etc.This article validates the implementation of the project in the Xilinx SDAccel development environment,builds an FPGA-based Winograd convolution engine,and shows that the FPGA layer can be used with other layers running on the host processor to run several popular volumes Product neural network.The results show that this implementation achieves 53 GFLOPS in a 3 � 3 convolutional kernel with a uniform step size.This implementation is for the overall FPGA implementation of the Caffe deep learning architecture,including the adaptation of the framework,the addition of Caffe Brew options(OCL),storage synchronization,enhanced storage flags,etc.,rather than a specific convolutional neural network Implementation.

Keywords/Search Tags:

CNN, SDAccel, Parallel acceleration, FPGA

PDF Full Text Request

Related items

1	Prototype Design And Implementation Of Parallel Acceleration Experiment Platform Based On FPGA
2	Research Of Acceleration Technology For Convolutional Neural Networks Based On FPGA
3	Research On FPGA Hardware Acceleration Platform For Deep Learning
4	CT Image Reconstruction Acceleration Research Based On FPGA
5	Design Of Convolutional Neural Network Acceleration System And FPGA Verification
6	Research On Acceleration Of Deep Convolutional Neural Network Based On FPGA
7	Research And Implementation Of Parallel TCP In Acceleration System For WAN Application
8	Research And Implementation Of Parallel TCP Acceleration System
9	FPGA Implementation Of ECG Identification Based On CNN
10	Research And Application Of Deep Learning's Parallel Acceleration