Font Size: a A A

Design And Research Of General Acceleration Scheme For Deep Learning

Posted on:2020-10-19Degree:MasterType:Thesis
Country:ChinaCandidate:H YuFull Text:PDF
GTID:2428330575979779Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
In recent years,Artificial Intelligence(AI),as a field with many practical applications and active research topics,has become one of the symbols of the development of science and technology in today's society.With the development of science and technology,Deep Learning(DL),as the core technology of artificial intelligence,has attracted much attention in order to better solve the problems that are hard to be described by formal rules but can be intuitively felt by human beings.With the advent of the era of big data,deep learning has shown great advantages in image recognition,speech recognition,natural language processing,search recommendation and other fields,and is still developing and changing.Artificial Neural Network(ANN)is the origin of deep learning.From only a single hidden layer perceptron to better learning characteristics of the deeper layer of the Neural network,such as the depth of the Neural network(Deep Neural Networks,within DNN),Convolutional Neural Networks(Convolutional Neural Networks,CNN),etc.,Deep study the development trend of growing rapidly,and in the academic field and application field of continuous innovation and improvement.This is not only due to the improvement of software algorithms,but also due to the rapid progress of hardware platforms,such as FPGA,GPU,etc.In order to enable the deep learning model to better learn data characteristics and obtain accurate prediction,not only a large number of input data sets are required,but also a large number of parameters need to be adjusted and updated,which puts forward higher requirements for the performance improvement and accelerated optimization of the model.With the development of neural networks,various types of networks also increase the diversity of this field,which makes it particularly necessary to propose a more general acceleration scheme.The research purpose of this paper is to extract the common parts of hot spot operation by analyzing the operation process of common neural network,and accelerate the application of common parts to hardware by virtue of the characteristics of hardware platform.The main work of this paper is as follows:(1)Study the training process and prediction process of DNN and CNN.Analyze the forward calculation and back-propagation algorithms of different types of hierarchies.It is concluded that the operation is frequently used,less logic control and tedious in calculation process,which is also called a hot spot operation.The hot calculation analyzed in this paper includes forward weight calculation,Batch Normalization(BN)calculation,activation function calculation,convolution calculation,pooling calculation and Back Propagation calculation of each layer.(2)It is also one of the focuses of this paper to analyze the specific process of each hot spot operation and extract the common calculation process.The common features extracted in this paper include matrix multiplication calculation with sparsity,normalization and activation function continuous calculation.(3)Combining the extracted common computing with the characteristics of FPGA hardware platform,an accelerated optimization scheme suitable for its hardware characteristics was proposed.Specific acceleration schemes include the quantization of floating-point Numbers in place of fixed-point Numbers,batch data parallelization and pipeline design,data preloading and caching.(4)In this paper,Vivado HLS was used as the simulation platform for comprehensive verification.A series of comparative experiments were conducted to verify the improvement of the acceleration performance,and the advantages and disadvantages of data analysis including the delay time of calculation and the number of hardware resources used were studied.The innovations of this paper include:(1)The im2 col method was used to transform the convolution calculation into matrix multiplication,and the batch normalization processing BN algorithm was added to the neural network model to participate in the common extraction.The use of the two methods made the optimization at the software level participate in the hardware acceleration.At the same time,BP algorithm with BN is derived and transformed into matrix multiplication.(2)The Re LU function is adopted as the activation function to introduce the sparsity into the model,and the common elements are designed by combining the sparsity features with the batch matrix multiplication calculation.Since the Re LU function as a conditional statement will interrupt the pipeline calculation in the CPU,it is extracted and applied to the FPGA as a common part,and the pipeline interruption can be avoided by using the multi-channel data selector.(3)Reduce bandwidth by using a quantization method that converts floating-point Numbers to fixed-point Numbers,and the BN normalization method introduced in this paper can make use of its fixed range to facilitate the quantization operation;Batch data parallel input is adopted to make full use of hardware parallelism.The acceleration scheme combined with hardware improves the performance of batch training data and prediction model,and the proposal of general part has a wider application range,which also makes the research direction of this paper valuable.
Keywords/Search Tags:Deep Learning, neural network, General accelerate, hardware, FPGA
PDF Full Text Request
Related items