Font Size: a A A

Research On Key Technologies Of High Performance Accelerator For Convolution And Recurrent Neural Networks

Posted on:2020-05-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:J W XuFull Text:PDF
GTID:1488306548492534Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,deep learning algorithms have become mainstream models in the field of machine learning.Among them,convolutional neural networks and recurrent neural networks are particularly effective in the intelligent classification,detection,and recognition of various data objects such as images,videos,sounds,and text.Outstanding,has become the two main types of deep neural network models.With the growing demand for precise sensing and high-precision recognition tasks,a large number of intelligent applications use deeper layer structures when adopting these two types of deep network models,and require the support of high computing power dedicated hardware.These two types of deep learning networks Acceleration technology has been a hot topic of research.At the same time,for different data objects and different accuracy requirements,convolutional neural network and recurrent neural network structures have many deformations in specific applications.Model design and accelerated optimization need to be carried out in conjunction with specific application areas to achieve good results.Therefore,this paper focuses on the acceleration optimization problems of convolutional neural networks and recurrent neural networks,and in-depth studies on software and hardware optimization technologies such as model structure design,model lightweighting,and parallel acceleration.The FPGA platform is also used to study the two types of network models and the agile design of application tasks.And implementation technology,it has obtained the energy-efficient implementation of the two most popular types of deep neural network application tasks.The innovations of this article mainly include:· Studied a convolutional neural network accelerator based on layer-folding pipeline model(Chapter 2)A convolutional neural network accelerator based on a layer-folding pipeline structure model is studied.In this paper,the fully-folding structure and fully-pipeline structure for accelerating implementation of convolutional neural network are studied in detail,and a layer-folding pipeline structure model is proposed.By analyzing the structure and operation commonality of each layer in the convolutional neural network,a folding layer structure of the accelerator is proposed,so that the common layer can be mapped to a layer of computing unit,and connected to the non-common layer through a pipeline stack to form the pipeline realization structure of the entire network.This structural model can balance the layer folding state according to different implementation constraints such as on-chip storage,memory access bandwidth,and on-chip computing resources,making full use of a given FPGA hardware resource to obtain the optimal throughput rate.The layered folding structure model unifies various convolutional neural network pipeline structures,making the fully folded and fully pipelined structures two special cases of the model.Based on the layer-folding pipeline structure model,this paper proposes a general accelerator structure framework,designing and implementing accelerators for different convolutional neural networks,and proposes a framework performance analysis model.Finally,based on the framework structure,Alex Net is implemented on Xilinx VC709 The two accelerators,VGG16 and VGG16,have achieved throughput rates of 593.5 GOP / s and 638.9 GOP / s,respectively.The best performance exceeds the current state-of-the-art convolutional neural network accelerator implementation.· Proposes an automatic generation model of convolutional neural network accelerator based on layer-folding pipeline model(Chapter 3)Aiming at the applicability of layer-folding pipeline structure caused by the variation of convolutional neural network layer size,an automatic generation model of convolutional neural network accelerator based on layer-folding pipeline model is proposed,from computing resources,on-chip storage resources,memory bandwidth and throughput rate.In this paper,an analysis model of layer folding pipeline structure is established,and a design space exploration algorithm based on multifactor constrained lower layer folding structure model is proposed,which can automatically search and generate optimal logic implementation for specific FPGA resource constraints.The automatic generation model effectively reduces the code development cycle of the convolutional neural network and greatly improves the application convenience.In this paper,three mainstream convolutional neural networks Alex Net,VGG-S and VGG16 are generated and verified on the Xilinx VC709 platform.Experiments show that the accelerator implementation achieved by the automatic generation model is less than 5% compared with the manual implementation.Verifies the effectiveness and efficiency of automatic generation.· Proposes an accelerator design for sequential convolutional neural networks(Chapter 4)Aiming at the deformation of sequential convolutional neural networks such as sound in specific application tasks,it is applied to the intelligent identification application of sound data to study the convolutional neural network application accelerator for specific tasks.Firstly,by analyzing the parallel characteristics of multi-scale convolution groups in 1-Max Pooling CNN,a one-dimensional convolver-based single-maximum convolutional neural network accelerator is designed and implemented.Compared with the traditional convolutional neural network accelerator,it is When you accomplish the same task,you can take up less resources for better performance.Secondly,this paper proposes a frequency dimension convolutional network model for voiceprint recognition,and optimizes the structure for hardware implementation constraints,reducing the computational complexity of the model.Then,based on the frequency-dimensional convolutional network model,the application accelerator that realizes the pyramid-shaped layered flow structure is designed,and the high throughput rate of the application task is obtained through the joint optimization of software and hardware.· Proposes a structural compression accelerator design for Recurrent neural network(Chapter 5)Based on the block cyclic matrix algorithm,we study the recurrent neural network accelerator represented by LSTM.Firstly,we simplified the LSTM model by using the block matrix loop algorithm.Then we analyzed the double-layer bidirectional LSTM memory access mode and proposed the LSTM accelerator based on the fullflow structure.Then,for the defect of the current dedicated LSTM accelerator application scenario,a general LSTM processor capable of computing single-layer LSTM layers of different scales is designed.In this general LSTM processor,the LSTM memory access structure is designed.In the forward/reverse and multi-layer cases,a set of instruction systems was finally proposed that can parse all LSTM layers into instructions for execution on the general LSTM processor.The two accelerator structures we propose are superior to the two in terms of throughput and the work of Wang et al.,and the structural integrity is stronger than the two.
Keywords/Search Tags:Convolutional neural network, Recurrent neural network, LSTM, FPGA, Layer folding pipeline, frequency-dimensional convolutional neural network, Block circulant matrix
PDF Full Text Request
Related items