Research On Algorithms Of Implementing Convolutional Neural Networks By Hardware

Posted on:2019-07-13

Degree:Master

Type:Thesis

Country:China

Candidate:J Ma

Full Text:PDF

GTID:2428330590967438

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

In recent years,Convolutional Neural Networks(CNN)have made breakthrough progress in fields of image classification,object detection,video semantic analysis and so on.The powerful feature learning and classification capabilities have drawn wide attention.Due to the large number of parameters and huge amount of computation complexity,it is not easy to apply CNNs directly to the hardware platform which has limited computing and storage resources.Therefore,it is urgent that we should study the effective hardware implementation of convolutional neural network.In this paper,FPGA implementation algorithms of convolutional neural network is studied.To accomodate hardware characteristics,code optimization methods based on block processing,cross-channel calculation,data caching and parallel computing is proposed.And we finally designed a dataflow suitable for hardware implementation.In addition,fully-connected layers are optimized by matrix decomposition and thus reduced the number of parameters and the amount of computation.Moreover,all data and parameters get processed by fixed-point transformation,which effectively relieves bandwidth pressure and improves processing efficiency.Based on the proposed hardware optimization techniques,this paper presents an FPGA implementation scheme of Tiny-YOLO network which is used for object detecting.To fully ultilize the limited storage resources and bandwidth resources of FPGA,a hierarchical memory structure is introduced in this paper.We use off-chip DDR SDRAM as input and output data storage,on-chip BRAM as first-level cache and registers as second-level cache.The hierarchical memory structure improves data reuse,ans reduces bandwidth pressure for data access.We discuss how to achieve rational distribution of computing resources,then design general parallel computing elements(PE)to achieve more efficient data processing.For the Tiny-YOLO network,we discuss the problem of sub-block sizes,and ultimately determine a solution that balances calculation and memory bandwidth.Finally,we use Vivado High-Level Synthesis(HLS)tool to synthesize and validate our design.We propose a testing scheme that applys unit testing,integrated testing and system testing respectively on our design and get the simulation results on VC707 FPGA platform.We analyze the synthesis results to see the occupation of hardware resources and verify the rationality of hardware design.Systhesis and simulation results show that the our system works at a frequency of 143 MHz and achieves a processing speed of 21 FPS.

Keywords/Search Tags:

Convolutional Neural Network, SVD, Fixed-Point Processing, FPGA

PDF Full Text Request

Related items

1	Research On Key Problems Of Fixed-point For Convolutional Neural Network
2	Research On Algorithm Of Convolutional Neural Network Suitable For Engineering Implementation
3	Research On Key Techniques Of Deep Convolutional Neural Network Accelerators Based On FPGA Bus Framework
4	Study And FPGA Implementation Of Handwritten Letters Recognition Based On Convolutional Neural Network
5	Research On Parallel Computing Architecture Of Siamese Network Algorithm
6	Design And Application Of Convolutional Neural Network Accelerator Based On FPGA
7	Design And Research Of Convolutional Neural Network Accelerator Based On PYNQ Embedded Platform
8	Fixed-Point Inference Of Neural Image Compression
9	Image Feature Point Extraction Based On Neural Network Is Implemented On FPGA
10	Research And Design Of Key Technology Of FPGA-based Convolutional Neural Network Accelerator