An Energy Efficient FPGA-based Embedded System For CNN Applications

Posted on:2019-04-19

Degree:Master

Type:Thesis

Country:China

Candidate:W A Xie

Full Text:PDF

GTID:2428330590451662

Subject:Integrated circuit engineering

Abstract/Summary:

PDF Full Text Request

In recent years,artificial intelligence has entered a period of fierce development.While the algorithms of deep learning are constantly being innovative,their functionality and applicability have also been extended and applied in various practical scenarios,including computer vision,natural language processing,smart driving,smart medical care,smart security and so on.However,with the increasing functions and performance demands for artificial intelligence equipments,traditional computing platforms such as GPUs and CPUs cannot meet the requirements of high energy efficiency,small size,and low cost in practical applications,which requires more solutions.Based on the high computation density and large data size in Convolutional Neural Network(CNN),this paper studies the embedded design method of �System on Programmable Chip(SOPC)�,and give full play to the advantages of both high-performance in hardware and flexibility in software,proposing a software-hardware integrated energy-efficient CNN application system implementation program.The main research contents are as follows.First of all,this paper implements the most critical modules of the convolutional neural network in the hardware circuit,and makes corresponding optimization according to their respective computing characteristics.For the convolutional layer,we design the FPGA on-chip parallelized computing architecture and pipeline computing unit,using the strategy of cyclic blocking and switching,parallelization and reuse.In addition,we implement a �multi-dimensional storage mapping� caching scheme and a �double-buffered caching� data transmission scheme,using the layered and ping-pong strategy,which are compatible with the computing architecture above.For the pooling layer,a comparative calculation structure with the method of �parallel to serial� was designed.For the active layer,a �lookup table mapping� or a �piecewise linear fitting� design scheme is respectively adopted and optimized for different activation functions.Secondly,this paper embed the configurable Nios ? soft core into the FPGA,and deliver part of the CNN calculations to the Nios ? core.The Nios ? core not only implements the image acquisition and preprocessing functions,but also designs software code to control the computational flow of the system,which communicates with external devices through a rich peripheral interfaces additionally.At the same time,we combines DMA controller,Avalon bus and various storage structures together to implement a highspeed and stable data transmission path between on-chip and off-chip memories.Finally,this paper implements the CNN embedded system on Altera's FPGA with hardware and software working together,improving and training the YOLO target detection model to achieve 96.74% recognition accuracy,and then realizes the CNN part in the network in the system.Based on the verifications of several key nodes during the system operation,we verify the correctness of the entire system compared to the original C model of the network.The system can achieve the highest peak throughput rate of 89.28 GOPS on the Stratix V platform with a working frequency of 180 MHz.The power consumption on the Cyclone IV platform is only 1.35 W,with the highest energy efficiency of 44.09GOPS/W.

Keywords/Search Tags:

FPGA, CNN, Hardware Acceleration, Embedded system, Energy efficiency

PDF Full Text Request

Related items

1	An Energy Efficient FPGA Hardware Architecture for the Acceleration of OpenCV Object Detection
2	Research Of Hardware Acceleration Of Embedded Operating System Based On Qt/Embedded Library
3	Design And FPGA Implementation Of Convolutional Neutral Network Acceleration Module
4	Low-Cost Hardware Profiling of Run-Time and Energy in FPGA Soft Processors
5	System-on-a-Chip (SoC) based hardware acceleration in Register Transfer Level (RTL) design
6	Using Hardware Acceleration To Improve The Efficiency Of Face Detection
7	Hardware Acceleration For Relational Databases On FPGA
8	Research And Implementation Of Convolutional Neural Network Acceleration Method Based On FPGA
9	Design And Implementation Of Hardware Acceleration Architecture Of Physical Layer Protocol Stack Based On FPGA
10	Software And Hardware Acceleration Design Of Shift Convolutional Neural Networks