Optimal Design Of Key Operator Circuits For Convolutional Neural Networks In Image Recognition

Posted on:2020-09-03

Degree:Master

Type:Thesis

Country:China

Candidate:W J Xu

Full Text:PDF

GTID:2428330626450764

Subject:Integrated circuit engineering

Abstract/Summary:

PDF Full Text Request

At present,artificial intelligence has driven the computer vision market including image recognition and video recognition.As convolutional neural network(CNN)is a commonly used processing model in image recognition,the acceleration of its inference process has become a research hotspot.With its efficiency and flexibility,coarse-grained reconfigurable array(CGRA)becomes an ideal platform to accelerate the inference process of CNN.As a result,the architecture of existing CGRA is optimized in this thesis to accelerate the inference process of CNN in image recognition.This thesis targets on the acceleration of key operator in the inference process and optimizes the array structure and on-chip memory structure of REMUS-II.Following works are done to improve processing elements(PE)utilization and throughput.Firstly,the inference process of CNN in image recognition is analyzed and convolutional layers are determined as the key operator through theoretical evaluation and experimental analysis.Secondly,the data flow characteristics of convolutional layers are analyzed and then a hybrid mapping scheme combining parallelism of convolutional windows with parallelism of output feature maps is proposed.According to the mapping scheme,the PE,array size and interconnect structure of REMUS-II are optimized.Thirdly,based on the characteristics of large data storage and reuse data in convolutional layers,a hybrid storage strategy about reuse data is proposed and then the multi-level on-chip memory structure of REMUS-II is optimized.The optimization strategy is composed of multi-bank input and output buffer,multi-channel data reuse cache and local weight buffer.And all the buffers are set to the pingpong working mode to solve data transmission delay.The results of RTL simulation experiments on AlexNet and VGG-16 models show that,after optimization on array structure and memory structure of REMUS-II,the PE utilization reaches 87.41%,which is increased by 8.41% compared with that before optimization;peak throughput is 119.95GOP/s,an increase of 19.95% compared with that before optimization under 150 MHz working frequency.Compared with EMAX architecture,peak throughput and PE utilization in this thesis are improved by 1.34� and 37.41%,respectively.

Keywords/Search Tags:

convolutional neural network, key operator acceleration, coarse-grained reconfigurable architecture, array structure optimization, memory structure optimization

PDF Full Text Request

Related items

1	Design And Optimization Of Reconfigurable Array For DCT And IDCT
2	Design And Application Of Coarse-grained Reconfigurable Neuromorphic Array
3	Research On Neural Network Implementation Method Based On Coarse Granular Reconfigurable Array Architecture
4	Research On The Design Methodology Of Application Specific Coarse Grained Reconfigurable System On Chip
5	General-purpose Algorithms Implementation And Optimization For Coarse-grained Dynamically Reconfigurable Processor
6	Data Memory Structure Design Of Coarse-grained Reconfigurable Processor For Radar Applications
7	Research On Architecture Design And Modeling Method Of Coarse-Grained Reconfigurable Array Based On Dataflow Decoupling
8	Research On Performance Optimization Of Coarse-grain Reconfigurable Array Processor
9	Circuit Design And Optimization For Key Operator Of Block Encryption Algorithm
10	Design And Optimization Of Configuration-Path Sub-System In Coarse-Grained Reconfigurable Processor For Radar Applications