Font Size: a A A

Research On Algorithms Of Implementing Convolutional Neural Networks By Hardware

Posted on:2019-07-13Degree:MasterType:Thesis
Country:ChinaCandidate:J MaFull Text:PDF
GTID:2428330590967438Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
In recent years,Convolutional Neural Networks(CNN)have made breakthrough progress in fields of image classification,object detection,video semantic analysis and so on.The powerful feature learning and classification capabilities have drawn wide attention.Due to the large number of parameters and huge amount of computation complexity,it is not easy to apply CNNs directly to the hardware platform which has limited computing and storage resources.Therefore,it is urgent that we should study the effective hardware implementation of convolutional neural network.In this paper,FPGA implementation algorithms of convolutional neural network is studied.To accomodate hardware characteristics,code optimization methods based on block processing,cross-channel calculation,data caching and parallel computing is proposed.And we finally designed a dataflow suitable for hardware implementation.In addition,fully-connected layers are optimized by matrix decomposition and thus reduced the number of parameters and the amount of computation.Moreover,all data and parameters get processed by fixed-point transformation,which effectively relieves bandwidth pressure and improves processing efficiency.Based on the proposed hardware optimization techniques,this paper presents an FPGA implementation scheme of Tiny-YOLO network which is used for object detecting.To fully ultilize the limited storage resources and bandwidth resources of FPGA,a hierarchical memory structure is introduced in this paper.We use off-chip DDR SDRAM as input and output data storage,on-chip BRAM as first-level cache and registers as second-level cache.The hierarchical memory structure improves data reuse,ans reduces bandwidth pressure for data access.We discuss how to achieve rational distribution of computing resources,then design general parallel computing elements(PE)to achieve more efficient data processing.For the Tiny-YOLO network,we discuss the problem of sub-block sizes,and ultimately determine a solution that balances calculation and memory bandwidth.Finally,we use Vivado High-Level Synthesis(HLS)tool to synthesize and validate our design.We propose a testing scheme that applys unit testing,integrated testing and system testing respectively on our design and get the simulation results on VC707 FPGA platform.We analyze the synthesis results to see the occupation of hardware resources and verify the rationality of hardware design.Systhesis and simulation results show that the our system works at a frequency of 143 MHz and achieves a processing speed of 21 FPS.
Keywords/Search Tags:Convolutional Neural Network, SVD, Fixed-Point Processing, FPGA
PDF Full Text Request
Related items