Implementation And Optimization Of Fully Connected Neural Network On FPGA

Posted on:2019-04-01

Degree:Master

Type:Thesis

Country:China

Candidate:X Zhou

Full Text:PDF

GTID:2428330545952505

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

With the rapid development of the semiconductor,the computing power has been greatly improved,and deep learning neural networks have once again into a wave of rapid development.The fully connected neural network can handle large-scale data,extract all data features,therefore,it still plays a role in image recognition.Fully Connected neural networks' training process is complex and it requires large amount of computation power.Experts often use GPU devices with high computing power for training.However,the calculation is relatively simple and requires high real-time performance that GPU is not a good candidate in the forward processing of neural network.Therefore,the forward inference process of neural networks is suitable for FPGAs because of the lower power consumption compared with GPU.Although the computing power of FPGAs could not catch up with GPU devices,the ability of real-time and low power consumption FPGAs own are very suitable for the forward process of neural networks.In this paper,we tried to propose a method with OpenCL to implement fully connected neural networks on FPGAs efficiently.At the same time,the performance is explored with a commercial fully connected neural network.What's more,this paper is also referential for other works.The main work and research results of this article mainly include two aspects:(1)Algorithm implement and optimization of fully connected layer.Traditional FPGAs development utilizes hardware description language,this mode requires developers be familiar with hardware of FPGAs;at the same time,the traditional development model is cumbersome to debugging and has a long development term.OpenCL framework brings new possibilities for FPGAs development now.OpenCL is a complete framework.,developers can use this framework programming on FPGAs.This paper analyzes the hotspots of computation in fully connected neural networks and proposes algorithm design and optimization schemes for two computational hotspots.For the fully connected layer,we use method of combining offsets to regulate the calculation tasks;and the parallelism is explored through packet division;at the same time,we implement data reuse for improving the data usage efficiency and reduce the pressure of memory access.For complex activation function(such as Sigmoid function),we analyze the commonly methods of implementing activation function,and measure which way is appropriate for FPGA.According to the characteristics of the activation function,this paper design a differential lookup table to implement the activation function without any loss of the accuracy.Finally,we compress the lookup table for saving storage space in the system.(2)System optimization.In order to maximize the utilization of various resources in the FPGAs system,we conduct further analysis based on optimized resource usage,pipeline,and memory access case,we use data rearrangement,single-instruction multiple data,multiplying pipelines,and semi-elaboration method to optimize the system.There are some other strategies have been offered to balance the resource occupancy in the system for expanding the circuit scale,and improving system performance further.The optimized version achieved 2.19x speedup under the 380MHz system frequency with 92%usage of RAMs and 42%usage of DSP blocks.

Keywords/Search Tags:

FPGA, OpenCL, Fully Connected Neural Network, Optimization of Algorithm, Optimization of System

PDF Full Text Request

Related items

1	Research On Hardware Implementation Technology Of CNN Fully Connected Layers Based On FPGA
2	Architecture optimization, training convergence and network estimation robustness of a fully connected recurrent neural network
3	A Study On Knowledge Representation Algorithm Based On Improved Fully Connected Neural Network
4	Research Of Neural Network Structural Optimization Based On Information Entropy
5	Parallel Computing Of Fully Connected And Convolutional Neural Networks Using COStream
6	Research On Acceleration Of Convolutional Neural Networks On FPGA Based On OpenCL
7	Deep Convolution Algorithm Optimization And Hardware Acceleration
8	Research On Compiler Optimization Based On OpenCL
9	Research On Optimization Of OpenCL For FPGA-based Deep Learning Applications
10	Research On Ensemble Arbitrary Connected Neural Network Based On Cooperative Co-evolution Optimization