Font Size: a A A

The Memory Management And Performance Optimization Of Caffe On The Master-slave Accelerator

Posted on:2016-01-17Degree:MasterType:Thesis
Country:ChinaCandidate:T XiaoFull Text:PDF
GTID:2348330536967227Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,deep learning has made breakthrough progress in various aspects,such as speech recognition,image classification,etc.And as the first successfully trained multi-layer network,convolution neural network(CNN)is widely uesd.However,due to CNN's special calculation mode,the implementation efficiency of CNN on the general processor is not high and can not meet the requirements of the performance.Therefore,kinds of accelerators based on DSP,FPGA and ASIC are developing rapidly,especially FPGA-based accelerators are gained more popularity from researchers.Cooperation of CPU and hardware accelerator on SoC FPGA to accomplish computational intensive tasks,provides significant advantages in performance and energy efficiency.However,current operating systems provide little support for accelerators: the OS is unaware that a computational task can be executed either on a CPU core or an accelerator,and provides no assistance in efficient management of data sharing between CPU and accelerator on the DRAM,such as zero copy,data coherence.It's also hard for current OS to allocate large contiguous physical memory space for accelerator.In this paper,we select the Xilinx ZYNQ as target and qualitatively analyze methods of sharing data.Besides using high-performance(HP)AXI interfaces of the ZYQN device,we develop a novel memory management system for FPGA-based CNN accelerator.It provides a unified virtual space for CPU cores and accelerator so that they can access the same memory space in the operating systems user space,at the same time,guaranteeing the data consistency.In order to put into practice the accelerator to go,we chose the more popular deep learning a learning framework-Caffe,evaluation analysis was carried out.For the calculating bottleneck part of Caffe,we use convolution neural network accelerator to accelerate with greatly reducing execution time of the program.In the process of accelerating,we implemented the mapping of the convolution operation to a hardware accelerator in the Caffe,and block the matrix calculation in convolution computation.For the convolution computing core by accelerator,we achieved the highest speedup 4.8,and for the entire Caffe applications,we achieve a full application acceleration than 2.74.At the same time,the paper also build a demonstration system of image classification based accelerator prototype,the results of classification can be displayed on a monitor via VGA interface on the development platform system.
Keywords/Search Tags:Convolutional Neural Network Accelerator, Shared Memory, Data Consistency, Caffe
PDF Full Text Request
Related items