Font Size: a A A

Research On Hardware Parallel Acceleration For Novel Convolutional Neural Networks

Posted on:2020-08-23Degree:MasterType:Thesis
Country:ChinaCandidate:D G WangFull Text:PDF
GTID:2518306548995939Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As one of the most popular algorithms in deep learning,Convolutional Neural Networks(CNN)has achieved great success in computer applications,and is widely used in speech recognition,image segmentation,image recognition and other fields.In order to improve the performance of CNN,the number and size of network layers are gradually increasing.However,simply relying on the increase in the number of network layers has encountered bottlenecks,and some novel convolutional neural networks have been proposed,such as deconvolution neural networks and complicated-connected convolutional neural networks.The structure of these network models is more complex,and the computational complexity is also greatly increased.Traditional CPU general-purpose processors have been unable to meet the computational demands of modern convolutional neural networks due to their low degree of parallelism and limited computing capability.Therefore,in order to enable large-scale applications of convolutional neural networks,many hardware accelerators for convolutional neural networks have emerged,such as GPUs,FPGAs,and ASICs.Due to its reconfigurability,computational resource richness and low power consumption,FPGA is favored in hardware acceleration research for convolutional neural networks.Previous FPGA-based hardware acceleration efforts focused on the design and optimization of traditional convolutional neural network accelerators,but FPGA accelerators for novel convolutional neural networks are still lacking.In this paper,we present an FPGA-based sparse deconvolution neural network accelerator architecture.We implemented our design on the Xilinx VC709 development platform and evaluated the resource utilization of the accelerator.Finally,the performance of four practical deconvolution neural network models was tested.With the development of deep learning,the structure of convolutional neural networks is more complex,the parameters are getting larger,and the computing and storage requirements are getting higher.The limited on-chip computing and storage resources of a single FPGA are difficult to meet the needs of mapping the entire network.This makes it difficult to increase the acceleration efficiency of a single FPGA.In this paper,we present a efficient design flow to accelerate complicatedconcatenated convolutional neural networks on multiple FPGA platforms,including directed acyclic graph(DAG)abstraction,mapping scheme generation,and design space exploration.Finally,we built a multi-FPGA system that supports flexible communication between FPGAs to support our design flow.We chose Goog Le Net,Dense Net and LNS-net three complicated-connected convolutional neural networks as benchmarks.The experimental results show that the proposed multi-FPGA system design is much higher in throughput and energy efficiency than CPU and GPU.
Keywords/Search Tags:Novel Convolutional Neural Networks, FPGA, Deconvolu-tional Neural Networks, Complicated-connected Convolutional Neural Net-works, Multi-FPGA
PDF Full Text Request
Related items