Font Size: a A A

Implementation And Verification Of Caffe Deep Learning Architecture Based On FPGA

Posted on:2021-01-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y L MinFull Text:PDF
GTID:2518306050954129Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
As an important branch of modern artificial intelligence,Deep Learning(DL)algorithm has been widely used in the realization of many projects such as pattern recognition,natural language processing,and machine vision.Convolutional Neural Network(CNN)is a deep learning algorithm based on the principle of biological brain visual cortex,known for its high classification and recognition accuracy.Caffe is the first industrial-level deep learning architecture.Field programmable gate array(Field Programmable Gate Array,FPGA)shows good advantages in the effective calculation of convolutional neural networks,because of the large number of parallel operations in convolutional neural networks.At present,there are a lot of development work on FPGAs for convolutional neural networks.This is because FPGAs have low power consumption,customizable,programmable structures,and their excellent performance in parallel computing.However,most of the current deep learning architectures do not have a general basic configuration other than CPU(Central Processing Unit)and GPU(Graphics Processing Unit)computing devices,which makes the difficulty of deep learning on FPGA greatly increased,and designers must New design and implementation of each model,testing the correctness of the network and performance optimization,can not simply use existing work.Convolutional neural networks are computationally intensive algorithms,which are particularly reflected in the large number of multiply-add operations that exist in convolutional layers.Multiply-add operations are an important factor that affects the overall efficiency of the algorithm,which prompts researchers to reduce the necessary operations in the convolutional layer the amount.At present,many research results have significantly improved the GPU implementation performance of convolutional neural networks,which is reflected in the reduction of the time for classification and training.Through these improvements,many deep learning frameworks can be used to achieve accelerated convolution on the CPU and GPU,but there are few accelerated convolution implementations for FPGAs.In this context,this article first analyzes the parallelism in the convolutional neural network and the parallelism in Open CL,mainly including the parallelism of the convolution operation itself,the parallelism of the filter,and the optimization strategy of the calculation unit replication in Open CL,Data parallel optimization strategy,task parallel optimization strategy,etc.Secondly,this paper analyzes the FPGA implementation of the Winogard convolution algorithm,and describes the optimization of the Winograd convolution,theoretically verifying that the algorithm can effectively reduce the consumption of various resources on the FPGA chip.Thirdly,the design of this paper uses FPGA as the acceleration device to implement the operation acceleration,and uses the CPU as the host to implement the control,uses the PCIe interface to realize the communication between the host and the FPGA,and implements the convolutional neural network on the heterogeneous platform.Finally,this paper designs a modified version of the deep learning framework Caffe,which has FPGA support,so that you can use FPGA to implement a convolutional neural network model written based on Caffe,and you can flexibly recreate FPGA devices when necessary Programming,to achieve seamless memory transaction processing between the host and the device,build an easy-touse test platform,create a pipeline layer to achieve inter-layer communication,etc.This article validates the implementation of the project in the Xilinx SDAccel development environment,builds an FPGA-based Winograd convolution engine,and shows that the FPGA layer can be used with other layers running on the host processor to run several popular volumes Product neural network.The results show that this implementation achieves 53 GFLOPS in a 3 × 3 convolutional kernel with a uniform step size.This implementation is for the overall FPGA implementation of the Caffe deep learning architecture,including the adaptation of the framework,the addition of Caffe Brew options(OCL),storage synchronization,enhanced storage flags,etc.,rather than a specific convolutional neural network Implementation.
Keywords/Search Tags:CNN, SDAccel, Parallel acceleration, FPGA
PDF Full Text Request
Related items