Study Of Heterogeneous Multi-core Acceleration Methods For Convolutional Neural Networks On Reconfigurable Platform

Posted on:2020-02-13

Degree:Doctor

Type:Dissertation

Country:China

Candidate:L Gong

Full Text:PDF

GTID:1368330575966581

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

Convolutional neural network(CNN)originates from traditional artificial neural network.As an important algorithm in machine learning,it has been widely deployed in different application scenarios such as artificial intelligence and computer vision.Due to the increasing complexity of applications in the real world,the scale and depth of net-work model are also increasing,leading to a great challenge of performance and energy efficiency for traditional computing platform.In this context,the hardware acceleration based on ASIC and FPGA has been widely used in the deployment of CNNs,and has become an effective method to improve the computational efficiency.However,the single-core architecture and the computation model of mainstream accelerators are in conflict with the inherent characteristics of CNNs,especially in reconfigurable devices such as FPGA on which the reconfigurability further highlights this conflict,which af-fects the further improvement of computational efficiency significantly.Aiming at the efficient hardware deployment of CNNs,in this paper,we com-bine the reconfigurable computing technology with heterogeneous multi-core architec-ture,and systematically propose the accelerator design and optimizing method based on heterogeneous multi-core architecture from static and dynamic two reconfiguration manners respectively,which alleviates the problem of mismatch between hardware and software computing features in hardware acceleration effectively.The specific contents and innovations are as follows:· At the static reconfiguration level,we propose a heterogeneous multi-core archi-tecture that solidifies all network layers on chip for the deployment of a specific CNN model on a specific FPGA platform,in which the computation of differ-ent layers is mapped to the exclusive core.Locally,each core can be deployed and optimized separately according to the paralleling characteristics of the corre-sponding layer.On the macro level,the inter-layers parallelism is fully exploited by computation pipelining between different layers.And on this basis,we use Roofline multi-core performance analysis model to coordinate on-chip computa-tion and off-chip memory accesses.For the deployment of AlexNet and VGG16D on a high performance FPGA platform,this architecture can achieve 2.44×and 2.35 x improvement on performance and energy efficiency respectively over the previous implementations on the same FPGA platform.· On the basis of the multi-core architecture of all layers solidification on-chip,we further propose a layer feature wise multi-core architecture at the static reconfig-uration level.By analyzing the hardware acceleration process of CNNs,we find two rules:Firstly,different convolutional layers have different memory access pattern of different data types,so deploying them separately in multi-core archi-tecture can minimize the access overhead.Secondly,although the structure of different layers is different,some of them have similar computing and memory access behaviors after loop unrolling and loop tiling,so hardware multiplexing be-tween these layers can achieve higher efficiency of hardware utilization.Based on above two rules,we propose a coarse-grained and a fine-grained layer clustering methods for layer-wise features.And on this basis,we increase the granularity of feature matching between hardware and software,and propose a layer-feature ori-ented multi-core acceleration method.For the deployment of AlexNet,VGG16C,VGG16D,and VGG19 on a high performance FPGA platform,this architecture can achieve 1.64 x and 1.84 × improvement on performance and energy efficiency respectively over the previous implementations on the same FPGA platform.· At the dynamic reconfiguration level,we propose a heterogeneous multi-core ar-chitecture with dynamical adaptivity between hardware and software based on the partial reconfiguration technique of FPGAs.For the first time,we introduce the dynamic partial reconfiguration of FPGA devices into the design of hardware ac-celerators of CNNs,which provides a mechanism for dynamic adjustment of the underlying hardware architecture according to the computing features of the ap-plication at runtime.On this basis,we model the hardware acceleration process as the Markov decision process,and determine the optimal runtime reconfiguration strategy for the accelerator deployment of a specific CNN model by means of deep reinforcement learning method,so as to fully exploit the hardware reconfigura-bility to improve computation adaptability.For the deployment of AlexNet and VGG16D on an embedded FPGA platform,this architecture can achieve 1.48×improvement on performance density over the previous implementations.

Keywords/Search Tags:

CNNs, Hardware acceleration, FPGA, Computing adaptability, Heterogeneous computing

PDF Full Text Request

Related items

1	The Design And Implementation Of A Local Multi-port Computing Acceleration Device Based On FPGA
2	Design And Research Of Deep Learning Heterogeneous Computing System Based On FPGA
3	Hardware Acceleration For Relational Databases On FPGA
4	A Hardware Acceleration Of Image Classification Algorithm Based On Convolutional Neural Network Implemented On FPGA
5	Reseach On Key Technology In FPGA-ASIC Heterogeneous Computing System Toward Agile Hardware Design
6	FPGA based hardware acceleration for brain-state-in-a-box models in neuromorphic computing
7	FPGA-accelerated Design And Implementation Of Optical Non-intuitive Super-resolution Microscopy Imaging Algorithms
8	Research On FPGA Heterogeneous Acceleration Of DeepFM Algorithm Based On CAPI Technology
9	Research And Implementation Of Recommendation Algorithm Accellerator Based On Heterogeneous Computing Platform
10	The Research And Implementation Of Deep Learning Heterogeneous Computing Platform Based On CPU And Multiple FPGA Architecture