Memory-optimized Access And Dedicated Processor Implementation Based On Convolutional Neural Network

Posted on:2020-12-01

Degree:Master

Type:Thesis

Country:China

Candidate:S L Li

Full Text:PDF

GTID:2428330599959727

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

Deep convolutional neural network?CNNs?has been widely used in various fields.Due to its special calculation process,it has the mechanism of local perception and weight sharing,and thus has strong displacement invariance,scale invariance and deformation invariance in image processing.Convolutional neural networks have achieved extremely high precision in multi-intelligent applications,such as image classification,target recognition,semantic recognition and behavior recognition.But with it,the amount of calculation and power consumption has increased dramatically.The computational complexity of convolutional neural networks is huge because in a convolutional layer with a high dimension,it needs to process hundreds of filters and channels simultaneously,which leads to a large amount of data transition between the processor and the memory.A state of the art convolutional neural network is composed of hundreds of layers of convolutional layers,and the amount of data movement and calculation is extremely large.Although the need for computational volume and throughput can be met by existing techniques,such as single instruction multiple data?SIMD?used in CPUs and single instruction multithreading?SIMT?techniques used in GPUs.However,its calculations and the power consumption due to the movement of data are still high,and it doesn't solve the problem of computational efficiency.Especially for IOT terminal computing,it requires low power consumption,real-time,low cost,excellent architecture,flexible framework,etc.Obviously,the existing CPU+GPU general-purpose computing framework has the disadvantages of high power consumption and high latency,so it cannot meet the requirements of IOT terminal computing.In order to meet these needs,dedicated neural network chips have emerged.In order to solve the problems in the general computing framework,this paper designs an application specific integrated circuit?ASIC?for convolutional neural networks,adopts a new reconfigurable computing framework and proposes Vertical Data Stream^[41]for this computing framework.The main research results are as follows.1.Aiming at the characteristics of convolutional neural network calculation,a computational framework called Coarse-Grained Reconfigurable Neuron Array?CGRNA?is proposed.The computational framework uses an artificial neural processing element?PE?as the basic computational unit,transmits data through a continuous set of shift registers associated with it,and employs distributed on-chip SRAM.The computing framework can flexibly implement neural networks of various structures and supports operations such as convolutional layers,full connection layers,and pooling layers.Experiments show that the computational framework greatly improves the computational efficiency of convolutional neural networks,especially for convolutional layers with extremely high dimensions.Compared with the general computational framework,computational efficiency will be greatly improved.2.A vertical data stream approach is proposed for the computational framework of coarse-grained reconfigurable artificial neuron arrays.The method improves the multiplexing data rate of the feature data and the weight data by changing the way the data of the neural network special map is stored in the memory,and the storage method of the vertical reading,thereby greatly improving the computational efficiency of the convolutional neural network.The experimental results show that this data flow method reduces the computational power consumption,computational latency,and chip area of the convolutional neural network,and finally reduces the chip cost.3.Aiming at the calculation framework of coarse-grained reconfigurable neuron array and the way of the vertical data stream,a special instruction set corresponding to it is proposed.With this instruction set,any convolutional layers,fully connected layers,and pooling layers can be realized.It can realize flexible and configurable features by controlling the data bit width in the calculation process and activating functions and other parameters.

Keywords/Search Tags:

convolutional neural network, ASIC, reconfigurable, vertical data stream, dedicated instruction set

PDF Full Text Request

Related items

1	Research On The Structure And Special Instruction-set Of A Reconfigurable Stream Cipher Processing System
2	Design And Implementation Of A High-performance Accelerator Dedicated For Convolutional Neural Networks
3	Instruction-flow Scheduling Mechanism For High-performance SIMD DSP
4	Compression Algorithm And Circuit Design Of Convolutional Neural Networks
5	Research On Behavior Recognition Algorithm Based On Two-stream Convolutional Neural Network
6	Research On Instruction Parallelization For Reconfigurable Heterogeneous Multi-Core Platform
7	Research And Design For High Performance Cnn Hardware Accelerator
8	Research On Algorithm Of Convolutional Neural Network Suitable For Engineering Implementation
9	Research On Heterogeneous Reconfigurable Dataflow Accelerator For Big Data Applications
10	Study On Reconfigurable Multi-mode Hardware Processing Structure Convolutional Neural Networks