The Memory Management And Performance Optimization Of Caffe On The Master-slave Accelerator

Posted on:2016-01-17

Degree:Master

Type:Thesis

Country:China

Candidate:T Xiao

Full Text:PDF

GTID:2348330536967227

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In recent years,deep learning has made breakthrough progress in various aspects,such as speech recognition,image classification,etc.And as the first successfully trained multi-layer network,convolution neural network(CNN)is widely uesd.However,due to CNN's special calculation mode,the implementation efficiency of CNN on the general processor is not high and can not meet the requirements of the performance.Therefore,kinds of accelerators based on DSP,FPGA and ASIC are developing rapidly,especially FPGA-based accelerators are gained more popularity from researchers.Cooperation of CPU and hardware accelerator on SoC FPGA to accomplish computational intensive tasks,provides significant advantages in performance and energy efficiency.However,current operating systems provide little support for accelerators: the OS is unaware that a computational task can be executed either on a CPU core or an accelerator,and provides no assistance in efficient management of data sharing between CPU and accelerator on the DRAM,such as zero copy,data coherence.It's also hard for current OS to allocate large contiguous physical memory space for accelerator.In this paper,we select the Xilinx ZYNQ as target and qualitatively analyze methods of sharing data.Besides using high-performance(HP)AXI interfaces of the ZYQN device,we develop a novel memory management system for FPGA-based CNN accelerator.It provides a unified virtual space for CPU cores and accelerator so that they can access the same memory space in the operating systems user space,at the same time,guaranteeing the data consistency.In order to put into practice the accelerator to go,we chose the more popular deep learning a learning framework-Caffe,evaluation analysis was carried out.For the calculating bottleneck part of Caffe,we use convolution neural network accelerator to accelerate with greatly reducing execution time of the program.In the process of accelerating,we implemented the mapping of the convolution operation to a hardware accelerator in the Caffe,and block the matrix calculation in convolution computation.For the convolution computing core by accelerator,we achieved the highest speedup 4.8,and for the entire Caffe applications,we achieve a full application acceleration than 2.74.At the same time,the paper also build a demonstration system of image classification based accelerator prototype,the results of classification can be displayed on a monitor via VGA interface on the development platform system.

Keywords/Search Tags:

Convolutional Neural Network Accelerator, Shared Memory, Data Consistency, Caffe

PDF Full Text Request

Related items

1	The Research On Convolutional Neural Network Accelerator Based On In-memory Computing
2	Multi-machine Interconnected In The Smp Environment
3	Studies On Shared-Memory Management And Optimization Technologies In Parallel And Distributed Operating Systems
4	A Convolutional Neural Networks Accelerator Based On Parallel Memory Technology
5	Research On Surface Scar Identification Of Apple Based On Convolutional Neural Network
6	Design Of General-purpose Convolutional Neural Network Accelerator Based On FPGA
7	Design And Optimization Of Tiny YOLO Convolutional Neural Network Accelerator
8	Design Of Neural Network Accelerator In Multiple Convolutional Modes
9	A Study On Data Sharing Over Mobile Devices Based On Dis-tributed Shared Memory
10	A Convolutional Neural Network Accelerator Based On FPGA