Font Size: a A A

Research On Accelerating Algorithm Of Neural Network Based On Quantization

Posted on:2021-02-04Degree:MasterType:Thesis
Country:ChinaCandidate:D W WanFull Text:PDF
GTID:2428330623967788Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Despite the remarkable success of Convolutional Neural Networks(CNNs)for various visual tasks,high computational and memory costs restrict their comprehensive applications to consumer electronics.Recently,advances in network quantization have demonstrated success in reducing the computational and memory costs of CNNs.However,quantization methods generally lead to significant performance degradation.Thus,there is an urgent need to design highly efficient and cost-effective CNNs for promoting their extensive usages across various edge devices.In this work,we propose a novel approach,which accelerates the high-cost dot product between ternary and binary vectors through efficient bitwise operations.Based on the novel acceleration method,we propose three quantization framework,the network with ternary inputs and binary weights(TBN),the network with scaled ternary inputs and binary weights(STBN)and the network with ternary weights and 2-bit quantized inputs(T2N),which can provide a tradeoff between memory,efficiency,and performance.Compared to standard CNNs,TBN/STBN provides approximately 32 x storage reduction and 40 x theoretical computational acceleration on CPU.The actual runtime of our TBN/STBN implementation on NVIDIA GPU is approximately the same as XNOR-Network.Various experiments shows than STBN outperforms all 1-bit quantized approaches on the ImageNet classification task,while TBN outperforms XNOR-Network up to 5.5% on the ImageNet classification task.Besides the accuracy of T2 N outperforms all methods whose weights are 1-bit quantized and inputs are2-bit quantized.In short,the proposed methods can accelerate and compress CNNs while maintaining accurate accuracy and can help apply CNNs on resource-limited devices.
Keywords/Search Tags:Acceleration and compression, convolutional neural networks, quantization
PDF Full Text Request
Related items