Implementation And Optimization Of Convolution Neural Network Library On Sunway Platform

Posted on:2020-04-11

Degree:Master

Type:Thesis

Country:China

Candidate:J M Shu

Full Text:PDF

GTID:2428330572488159

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

Deep Convolutional Neural Networks(CNN,Convolution Neural Network)play a vital role in the field of image recognition.As the complexity of the problem in-creases,CNN needs a larger model to improve accuracy.Studies have shown that the depth of CNN has evolved from dozens of layers in the original ImageNet competition to the current thousands of layers.The increase in depth has led to a surge in com-putational complexity,and therefore requires more powerful computing power to train the entire neural network.Sunway Taihu Light is the world's third-ranked supercom-puter.Its floating-point computing performance is provided by Sunway 26010 many-core processor.The single-processor peak double-precision floating-point calculation performance is 3.06Tflops.The whole machine theoretical floating-point calculation reaches 125Pflops.The use of Sunway Taihu Light for neural network training can al-leviate the current lack of computing power of CNN and promote the development of domestic processors in the field of artificial intelligence.On commercial platforms such as NVIDIA GPUs and Intel CPUs,the implementation of CNN network models relies on deep learning libraries,while Sunway Taihu Light lacks a basic convolutional neu-ral network library.In summary,this paper designs and implements the convolutional neural network library swABL-DNN(Sunway Application Boost Library-DNN)on the Sunway processor,implements a swABL-DNN test framework,and proposes a parallel convolution algorithm based on the Sunway processor structure.The main work and research of this paper include:1.swABL-DNN library API design and feature introduction.Since cuDNN is not open source,we analyzed typical convolutional neural network and backpropagation algorithm to obtain the APIs that swABL-DNN needs to include,design data structure and API naming similar to CUDNN(CUDA Deep Neural Network Library),finally we implemented swABL-DNN by coding.All APIs of swABL-DNN support single-precision and double-precision floating-point calculations,and it uses 4-dimensional Tensor to store data.By modifying descriptors to complete core calculations for differ-ent functions,most APIs control the data in the buffer,so basic operations are done so they can be easily integrated into other frameworks.2.swABL-DNN test and analysis.We design and realize a basic test program that be used to test the functionality of swABL-DNN and analyze APIs.The program is capable of storing network models and training and predictions,and is able to test the time-base of APIs in the library throughout the neural network.Through analysis,swABL-DNN can support the implementation of AlexNet,GoogLeNet,VGG,LeNet-5 and U-Net,and the test results show that the proportion of convolutional layer API is 90%in AlexNet and U-Net.3.Convolutional library function optimization.The core of the convolutional library function is a special form of batch convolution calculation.On the commer-cial platform NVDIA GPU and Intel CPU,this batch convolution calculation is trans-formed into matrix multiplication implementation,called GEMM(General Matrix Mul-tiply)algorithm,but on the Sunway processor,the GEMM algorithm is very inefficient.swCaffe and swDNN proposed a parallel convolution algorithm when the batch size is large,but there are restrictions on the convolution parameters.If the parameter con-straints are not satisfied,they use the GEMM algorithm,resulting in low overall per-formance.This paper analyzes the reason why the GEMM algorithm is not efficient on the Sunway processor and why the parameters of the swCaffe parallel convolution algorithm are limited.Based on the main core+slave core array system architecture,we implemented and optimized a parallel convoluiton algorithm without parameters limit.The parallel convolution algorithm proposed in this paper based on Direct algo-rithm,we optimize the algorithm by using data reuse,software pipeline and manual vec-torization method.This algorithm has 4.0x to 4.5x performance improvement compared with GEMM algorithm.,but not as good as the swCaffe parallel convolution algorithm when meeting the parameter limits.

Keywords/Search Tags:

Sunway Taihu Light, Convolution Neural Network, Library, Parallel Convolution Algorithm, SIMD Data Parallel, Optimization

PDF Full Text Request

Related items

1	Research On Directive-based Parallel Language For Sunway Taihulight Supercomputer And Design Of The Compiling Optimization
2	Parallel Implementation And Performance Optimization For FHI-aims On The Sunway Many-core Architecture
3	Implementation And Optimization Of Password Recovery On Sunway Taihu Light
4	The Research On Optimization Of Convolution Neural Network Parallel Algorithm Based On Distributed Environment
5	Design And Implementation Of Hybrid Parallel Genetic Algorithm Based On Sunway Many-core Processors
6	Optimization Of Convolution Based On CPU SIMD Instruction Set
7	Performance Heterogeneity-Oriented Convolution Neural Network Parallel Optimization
8	Multi-granularity Parallel Optimization Of Convolution Neural Networks
9	Application Of Convolution Network Algorithm Based On Sdsoc
10	Research On Parallel Algorithm Of Convolutional Neural Network Based On GPU