Font Size: a A A

Design And Implementation Of Benchmarks For Deep Learning Processors

Posted on:2020-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:Q Q XuFull Text:PDF
GTID:2428330578483124Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
In recent years,convolutional neural networks as the most important deep learning algorithms have received extensive attention and research in the industry,and especially play an important role in fields such as computer vision.Deeper networks tend to pro-vide better effect,so convolutional neural networks become more and more complex.With the deepening of the network structure and growth of training data,the general-purpose processors can not meet the computing requirements of such applications well.Therefore,the chip architecture is developing in the direction of adapting to such ap-plications.A series of deep learning dedicated chips are proposed,Cambrian DianNao series and Google TPU are the most famous among them.They design and accelerate the special hardwares,even give efficient application-specific instruction set for convo-lutional neural networks.We can say that deep learning processors are accelerators for convolutional neural networks.In the process of designing processors,standard bench-marks and evaluation metrics are crucial.This paper proposes a set of benchmarks for deep learning processors.Our benchmarks aim to objectively evaluate deep learn-ing hardwares,judge the rationality of processors and compare different processors,guide the optimization of the system at the hardware and software levels.Hardware researchers can design effective deep learning processors based on our benchmarks.The work and research results of this paper mainly include:(1)Determine the principles of selecting applications for deep learning bench-marks.Finally 20 representative and popular convolutional neural networks are choosen.These networks are provided with classic data sets and constitute the mac-robenchmarks.For the convenience of performance evaluation,the core network lay-ers in convolutional neural networks and their common configurations are extracted.These network layers are provided with classic input sets of different sizes and consti-tute the microbenchmarks.The microbenchmarks consist of 45 compact network layer modules and reduce codes to a great extent.After determining the constitution of our benchmarks,we implement the benchmarks on general purpose processors including general purpose CPU and GPU and domestic sunway processor.(2)About the networks in macrobenchmarks,detailed analysis is given from the aspects of calculated amount,parameter amount,topological structure and time con-suming of their components.We explain the influence factors of network communica-tion overhead and find the performance bottleneck in the process of training networks.As for the network layers in microbenchmarks,we analyse the implementation details of their forward process and backward process and get the basic operations in them.This result provides important foundations for deep learming special instruction set.(3)Finally we show that how to evaluate performance with the benchmarks by giv-ing an evaluation example of the benchmarics on real hardware platforms.This paper provides macrobenchmarks with a set of system metrics including I/O waiting,cross-node communication and CPU utilization.We get the characteristics and performance bottlenecks in system level of these networks.This paper provides a set of microarchi-tecture metrics for microbenchmariks including IPC,CPU stall ratio,branch mispredic-tion ratio and cache miss rate and so on.We get the characteristics and performance bottlenecks in microarchitecture level of these network layers.We analyse the perfor-mance evaluation results and give some advice for designing and optimizing processors.
Keywords/Search Tags:Benchmark, Convolutional Neural Network, Network Layer, Basic Operation, Performance Analysis
PDF Full Text Request
Related items