Font Size: a A A

Optimal Design Of Key Operator Circuits For Convolutional Neural Networks In Image Recognition

Posted on:2020-09-03Degree:MasterType:Thesis
Country:ChinaCandidate:W J XuFull Text:PDF
GTID:2428330626450764Subject:Integrated circuit engineering
Abstract/Summary:PDF Full Text Request
At present,artificial intelligence has driven the computer vision market including image recognition and video recognition.As convolutional neural network(CNN)is a commonly used processing model in image recognition,the acceleration of its inference process has become a research hotspot.With its efficiency and flexibility,coarse-grained reconfigurable array(CGRA)becomes an ideal platform to accelerate the inference process of CNN.As a result,the architecture of existing CGRA is optimized in this thesis to accelerate the inference process of CNN in image recognition.This thesis targets on the acceleration of key operator in the inference process and optimizes the array structure and on-chip memory structure of REMUS-II.Following works are done to improve processing elements(PE)utilization and throughput.Firstly,the inference process of CNN in image recognition is analyzed and convolutional layers are determined as the key operator through theoretical evaluation and experimental analysis.Secondly,the data flow characteristics of convolutional layers are analyzed and then a hybrid mapping scheme combining parallelism of convolutional windows with parallelism of output feature maps is proposed.According to the mapping scheme,the PE,array size and interconnect structure of REMUS-II are optimized.Thirdly,based on the characteristics of large data storage and reuse data in convolutional layers,a hybrid storage strategy about reuse data is proposed and then the multi-level on-chip memory structure of REMUS-II is optimized.The optimization strategy is composed of multi-bank input and output buffer,multi-channel data reuse cache and local weight buffer.And all the buffers are set to the pingpong working mode to solve data transmission delay.The results of RTL simulation experiments on AlexNet and VGG-16 models show that,after optimization on array structure and memory structure of REMUS-II,the PE utilization reaches 87.41%,which is increased by 8.41% compared with that before optimization;peak throughput is 119.95GOP/s,an increase of 19.95% compared with that before optimization under 150 MHz working frequency.Compared with EMAX architecture,peak throughput and PE utilization in this thesis are improved by 1.34× and 37.41%,respectively.
Keywords/Search Tags:convolutional neural network, key operator acceleration, coarse-grained reconfigurable architecture, array structure optimization, memory structure optimization
PDF Full Text Request
Related items