Font Size: a A A

Study Of Sparse Neural Networks And Sparse Neural Network Accelerators

Posted on:2020-08-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:X D ZhouFull Text:PDF
GTID:1368330572969072Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Neural networks have become the dominant algorithms rapidly as they achieve state-of-the-art performance in a broad range of applications such as image recognition,object detection,speech recognition and natural language processing.However,neural networks keep moving towards deeper and larger architectures,posing a great chal-lenge to the huge amount of data and computations.Although sparsity has emerged as an effective solution for reducing the intensity of computation and memory accesses di-rectly,irregularity caused by sparsity(including sparse synapses and neurons)prevents processing platforms,including CPU,GPU and accelerators from completely leverag-ing the benefits.In this dissertation,we propose a cooperative software/hardware approach to ad-dress the irregularity of sparse neural networks efficiently.Initially,based on a wide range of experiments,we observe the local conver-gence,namely larger weights tend to gather into small clusters during training rather than randomly distributed.Based on that key observation,we propose a software-based coarse-grained pruning technique to reduce the irregularity of sparse synapses drasti-cally.Instead of pruning synapses independently,our proposed coarse-grained pruning prunes several synapses together.The synapses are firstly divided into blocks;a block of synapses will be permanently removed from the network topology if it meets specific criteria.We then employ the fine-tuning approach to retain the network accuracy.Note that we apply the coarse-grained pruning iteratively in training to achieve better sparsity and avoid the accuracy loss.The coarse-grained pruning can reduce the irregularity by 20.13 × on average.Then we introduce a novel compression algorithm,a three stage pipeline:coarse-grained pruning,local quantization and entropy encoding,that work together to reduce the storage requirement of AlexNet and VGG16 by 79 x and 98 x,respectively.The compression ratio is much higher than that achieved by two existing state-of-the-art neural network compression methods,i.e.,Deep Compression(35 x and 49 x)and CNNPack(39 x and 46 x).We further design a hardware accelerator named Cambricon-S to address the re-maining irregularity of sparse synapses and neurons efficiently.The novel accelerator features a central neural selector module(NSM)to leverage coarse-grained sparsity.Additional the synapse selector module(SSM),Encoder and weight decoding mod-uel(WDM)are used to leverage neuron sparsity,dynamically compress the neurons and leverage local quantization,respectively.Compared with a state-of-the-art sparse neural network accelerator Cambricon-X,our accelerator is 1.71 x and 1.75 x better in terms of performance and energy efficiency,respectively.To ease the burden of pro-grammers,we also propose a high efficient library-based programming environment for our accelerator.The compiler is able to apply loop tiling and data reuse strategies for highly efficient instructions.
Keywords/Search Tags:neural networks, sparsity, compression, accelerator
PDF Full Text Request
Related items