Research On Compression And Performance Improvement Of Deep Neural Networks

Posted on:2023-02-28

Degree:Master

Type:Thesis

Country:China

Candidate:Y F Huang

Full Text:PDF

GTID:2568306902983879

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

Recently,deep neural networks(DNNs)have developed rapidly,and are widely used in numerous areas.To handle more complex challenges and achieve higher model accuracy,the scale of deep neural networks is growing explosively.As a result,DNNs cost more and more computing and storage resources.It poses great pressure to hardware.To apply DNNs on resource-restricted devices,it’s necessary to simplify and accelerate these models.DNNs mainly consists of linear and nonlinear layers,this paper achieves model compression and performance improvement by compressing linear layers and accelerating nonlinear layers.Linear layers contain most of parameters of the model.It’s an effective way to simplify model by quantizing linear layers.These parameters of linear layers obey a Gaussian distribution.However,linear layers are quantized to low bit-width fixed-point numbers in mainstream methods,which may cause large error.Besides,fixed-point numbers are improper for layers with nonlinear computation such as transcendental functions.There are few studies on nonlinear layers at present.However,in recent Transformer models,the computation cost of nonlinear layers can’t be ignored.To solve these problems,this paper compresses linear layers and accelerates nonlinear layers in DNNs respectively.To quantize linear layers more precisely,this paper proposes a new 8-bits floatingpoint format:QFP8.Compared with current 8-bit formats,it can present values that obey a Gaussian distribution more precisely.Since data distributions change dramatically among layers,QFP8 format uses dynamic bias to adjust the range of data representation,which can reduce quantization error.Besides,statics of BatchNorm layers in quantized models are biased,and calibration on these statics can improve inference accuracy.The experiment shows that compared with other 8-bits format,QFP8 format can achieve a higher inference accuracy both on post training quantization and quantization aware training tasks.And calibration of statics of BatchNorm layers can further improve quantized model accuracy.To accelerate nonlinear layers,this paper proposes a method based on multisegment interpolation fitting.This method guarantees a controllable fitting error,and it can satisfy demand for accuracy of different tasks.What’s more,its computation complexity is merely O(1),therefore nonlinear layers can be accelerated effectively both in inference and training process.It only needs basic hardware instructions support,can be employed on servers and edge devices.The experiment results show that using this interpolation method to fit various nonlinear layers,it can achieve effective acceleration both on CPU and GPU.These interpolated nonlinear layers can be applied conveniently on various models,and inference and training process can achieve promised accuracy with a degradation of less than 0.5%.Combined with quantization of linear layers using QFP8 format,it can further accelerate inference process with promised model accuracy.

Keywords/Search Tags:

Linear Layers, Nonlinear Layers, Quantization, Acceleration, Floating Point Number, Multi-Segment Interpolation

PDF Full Text Request

Related items

1	Study The Control System Of Floating Dock Base Three Layers Control Network
2	Thin Film Encapsulation With Al₂O₃/alucone Hybrid Multi Layers
3	The Design And Implementation Of Floating Point Unit Based On ARMv7 Floating Point Instruction Set
4	Neutral Parametric Database, Server, Logic Layers, and Clients to Facilitate Multi-Engineer Synchronous Heterogeneous CAD
5	Development of an airborne multi-channel FMCW radar for high-resolution mapping of internal layers in glacial ice
6	Preparation And Properties Of WO₃/Ag/PEI/CuSCN Multi-layers Transparent Electrode
7	Design And Implementation Of Hot-Rolling Scheduling System Based On Layers Structure Pattern
8	Study On Electro-Thermo-Mechanical Characteristics Of Semiconductor Power Devices With Graphene Layers
9	Mapping of ice sheet deep layers and fast outlet glaciers with multi-channel-high-sensitivity radar
10	Analysis And Design Of High-performance Floating-Point Unit