Scaling The Training Of Recurrent Neural Networks On Sunway Taihulight

Posted on:2020-03-23

Degree:Master

Type:Thesis

Country:China

Candidate:O Y Li

Full Text:PDF

GTID:2428330626964646

Subject:Ecology

Abstract/Summary:

PDF Full Text Request

Recurrent neural network is an important part of deep learning and has a wide range of applications in sequence processing problems.At first,recurrent neural networks were mostly applied in natural language processing,but following performance and versatility developments,solutions based on recurrent neural networks have emerged for recognition and prediction problems in many areas.Typical applications in earth science include land cover change detection,remote sensing image caption generation and temperature forecasting.Similar to the trend of other mainstream neural networks,the amount of data required for training of recurrent neural networks,the complexity of the network structure,and the time required to complete the training is growing exponentially.Therefore,the implementation of large-scale distributed training of recurrent neural networks on largescale parallel platforms such as the Sunway Taihu Light supercomputer has become a key research topic.The large-scale training of neural networks is based on the data parallel mode.By improving the training mini-batch size,the efficiency of the model traversing the data sets is improved,and the convergence speed of the model is also improved.The major challenges for achieving large-scale training of recurrent neural networks on Sunway Taihu Light include the computational efficiency within each model,the communication efficiency in distributed training,the memory allocation strategies for recurrent neural networks,and the problem of ensuring the convergence speed of the model in a largescale training scenario.The main work of this thesis consists of three parts: 1.Based on the existing parallel training framework Sunway Caffe,a distributed model testing module is integrated to the training process,which shortens the time required for model testing.A distributed model training scheme is designed to completely hide the time of model testing in high-frequency scenarios.2.A systematic optimization of the training process from different aspects.In terms of computational performance,two major computational hotspots are optimized: the exponential function and the softmax function,to achieve an overall acceleration of 12.63 times.In terms of communication efficiency,redundant communication for recurrent neural networks is removed,and the MPI?Allreduce function is redesigned according to the network topology of Sunway Taihu Light,which has improved the communication efficiency by 20 times.In terms of memory optimization,the memory allocation module is redesigned for the structure of the recurrent neural networks to ensure full utilization of memory.3.The large-scale training of recurrent neural networks is explored,covering aspects that include the selection of optimization algorithms,the design of training scales,and the analysis of the convergence speed of models in large-scale training and the performance of models.Combining the efforts mentioned above,100 training nodes are used to reduce the number of iterations required for convergence by roughly 100 times when compared to the single-node training process.The work is further expanded to 800 nodes to support the training of even larger recurrent neural networks.

Keywords/Search Tags:

Recurrent Neural Network, Large-scale Training, Sunway Taihulight

PDF Full Text Request

Related items

1	Porting And Optimizing GTC-P Code On Sunway TaihuLight Supercomputer
2	Parallel Deep Learning Training System On Sunway TaihuLight
3	An Accelerated Ray Tracing Algorithm For The Sunway Taihulight
4	Design And Implementation Of Heterogeneous Parallel Algorithms On The Sunway Taihulight
5	Parallel Algorithm Analysis And Optimization Of Plasma Structure Preserving Large-scale Simulation On Sunway Platform
6	Porting And Optimization Of OpenFOAM On The Sunway Taihulight Supercomputer
7	Optimization Of Molecular Dynamics Algorithms Based On The Sunway TaihuLight Supercomputer
8	The Design And Optimization Of High-performance Molecular Dynamics Algorithms On The Sunway TaihuLight Supercomputer
9	The Research Of High Performance Algorithm For GROMACS Based On Sunway TaihuLight
10	I/O Resource Monitoring And Diagnosis System For The Sunway TaihuLight