Design And Implementation Of Benchmarks For Deep Learning Processors

Posted on:2020-02-09

Degree:Master

Type:Thesis

Country:China

Candidate:Q Q Xu

Full Text:PDF

GTID:2428330578483124

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

In recent years,convolutional neural networks as the most important deep learning algorithms have received extensive attention and research in the industry,and especially play an important role in fields such as computer vision.Deeper networks tend to pro-vide better effect,so convolutional neural networks become more and more complex.With the deepening of the network structure and growth of training data,the general-purpose processors can not meet the computing requirements of such applications well.Therefore,the chip architecture is developing in the direction of adapting to such ap-plications.A series of deep learning dedicated chips are proposed,Cambrian DianNao series and Google TPU are the most famous among them.They design and accelerate the special hardwares,even give efficient application-specific instruction set for convo-lutional neural networks.We can say that deep learning processors are accelerators for convolutional neural networks.In the process of designing processors,standard bench-marks and evaluation metrics are crucial.This paper proposes a set of benchmarks for deep learning processors.Our benchmarks aim to objectively evaluate deep learn-ing hardwares,judge the rationality of processors and compare different processors,guide the optimization of the system at the hardware and software levels.Hardware researchers can design effective deep learning processors based on our benchmarks.The work and research results of this paper mainly include:(1)Determine the principles of selecting applications for deep learning bench-marks.Finally 20 representative and popular convolutional neural networks are choosen.These networks are provided with classic data sets and constitute the mac-robenchmarks.For the convenience of performance evaluation,the core network lay-ers in convolutional neural networks and their common configurations are extracted.These network layers are provided with classic input sets of different sizes and consti-tute the microbenchmarks.The microbenchmarks consist of 45 compact network layer modules and reduce codes to a great extent.After determining the constitution of our benchmarks,we implement the benchmarks on general purpose processors including general purpose CPU and GPU and domestic sunway processor.(2)About the networks in macrobenchmarks,detailed analysis is given from the aspects of calculated amount,parameter amount,topological structure and time con-suming of their components.We explain the influence factors of network communica-tion overhead and find the performance bottleneck in the process of training networks.As for the network layers in microbenchmarks,we analyse the implementation details of their forward process and backward process and get the basic operations in them.This result provides important foundations for deep learming special instruction set.(3)Finally we show that how to evaluate performance with the benchmarks by giv-ing an evaluation example of the benchmarics on real hardware platforms.This paper provides macrobenchmarks with a set of system metrics including I/O waiting,cross-node communication and CPU utilization.We get the characteristics and performance bottlenecks in system level of these networks.This paper provides a set of microarchi-tecture metrics for microbenchmariks including IPC,CPU stall ratio,branch mispredic-tion ratio and cache miss rate and so on.We get the characteristics and performance bottlenecks in microarchitecture level of these network layers.We analyse the perfor-mance evaluation results and give some advice for designing and optimizing processors.

Keywords/Search Tags:

Benchmark, Convolutional Neural Network, Network Layer, Basic Operation, Performance Analysis

PDF Full Text Request

Related items

1	Research On Key Technologies Of High Performance Accelerator For Convolution And Recurrent Neural Networks
2	Research On Construction And Application Of Lightweight Neural Network
3	Research On The Mechanism Of Basic Elements Of Convolutional Neural Network
4	Research On Face Detection Based On Two-layer Cascaded Convolutional Neural Network
5	Research And Application Of Mobile Network Characteristics Based On Cell Performance
6	Research On Multi-layer Neural Network Model Updating Facing Network Traffic
7	Research On Convolutional Neural Network For Graphs
8	OpenCL Accelerated Deep Convolutional Neural Networks Inference And Performance Model
9	Design And Implementation Of A High-performance Accelerator Dedicated For Convolutional Neural Networks
10	Generating And Analysis Of Energy Consumption Benchmark For Telecommunication Base Station Based On Data Mining