Research On Accelerating Algorithm Of Neural Network Based On Quantization

Posted on:2021-02-04

Degree:Master

Type:Thesis

Country:China

Candidate:D W Wan

Full Text:PDF

GTID:2428330623967788

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Despite the remarkable success of Convolutional Neural Networks(CNNs)for various visual tasks,high computational and memory costs restrict their comprehensive applications to consumer electronics.Recently,advances in network quantization have demonstrated success in reducing the computational and memory costs of CNNs.However,quantization methods generally lead to significant performance degradation.Thus,there is an urgent need to design highly efficient and cost-effective CNNs for promoting their extensive usages across various edge devices.In this work,we propose a novel approach,which accelerates the high-cost dot product between ternary and binary vectors through efficient bitwise operations.Based on the novel acceleration method,we propose three quantization framework,the network with ternary inputs and binary weights(TBN),the network with scaled ternary inputs and binary weights(STBN)and the network with ternary weights and 2-bit quantized inputs(T2N),which can provide a tradeoff between memory,efficiency,and performance.Compared to standard CNNs,TBN/STBN provides approximately 32 x storage reduction and 40 x theoretical computational acceleration on CPU.The actual runtime of our TBN/STBN implementation on NVIDIA GPU is approximately the same as XNOR-Network.Various experiments shows than STBN outperforms all 1-bit quantized approaches on the ImageNet classification task,while TBN outperforms XNOR-Network up to 5.5% on the ImageNet classification task.Besides the accuracy of T2 N outperforms all methods whose weights are 1-bit quantized and inputs are2-bit quantized.In short,the proposed methods can accelerate and compress CNNs while maintaining accurate accuracy and can help apply CNNs on resource-limited devices.

Keywords/Search Tags:

Acceleration and compression, convolutional neural networks, quantization

PDF Full Text Request

Related items

1	Research On Accelerating Algorithm Of Neural Network Based On Quantization
2	Research On Compression And Acceleration For Deep Convolutional Network Model
3	Model Compression And Hardware Acceleration Of Convolutional Neural Networks
4	Research On Application Of Neural Network Compression And Acceleration Based On Quantization
5	Research On Compression And Acceleration Of Deep Convolutional Neural Networks
6	Research On Model Compression And Acceleration For Convolutional Neural Network Under Resource-constrained Scenarios
7	Deep Convolutional Neural Networks Compression Based On Sparsity And Quantization
8	Research On Dynamic Quantization Algorithm Of Convolutional Neural Networks And Its Parallel Computing Structure
9	Compression And Acceleration Of Vision Algorithm Model Based On Convolutional Neural Networks
10	Compression Algorithm And Circuit Design Of Convolutional Neural Networks