On Improving Computing Performance Of DNNs Under Resource Constraints

Posted on:2021-10-30

Degree:Master

Type:Thesis

Country:China

Candidate:W C Wang

Full Text:PDF

GTID:2518306452463444

Subject:Master of Engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of deep learning and artificial intelligence applications,the demand for deploying neural networks based on resource constrained platforms such as mobile terminals increases dramatically.However,the computing resources and storage resources of such platforms are relatively limited,so it is difficult to configure large-scale convolution or deep neural networks.In this paper,neural network compression algorithm is introduced to solve this kind of problem.By "Quantization + Sparseness" compression network,the amount of network parameters and the amount of computation needed to run the network are greatly reduced,so as to realize the effective deployment of large-scale neural network on the resource constrained platform.However,there are some limitations in the traditional neural network quantification and sparseness,such as the difficulty in maintaining the network rule structure and over dependence on the custom accelerator,which lead to the difficulty in efficient deployment of neural network.In order to solve the above problems,this paper studies from two aspects.Firstly,combined with the idea of neural network quantization optimization,a neural network compression algorithm based on approximate quantization of weight matrix is proposed to reduce the storage requirements of large-scale neural network for hardware and realize the compression and storage of neural network.Secondly,the quantized network is thinned and the neural network is thinned The irregular network structure caused by thinning is optimized to realize the compression and reorganization of the bottom weight matrix,and then the matrix multiplication optimization algorithm is designed to achieve efficient network operation and data access.In this paper,firstly,the neural network is quantized.Because of the shortcomings of traditional neural network quantization,such as the extra computation generated by quantization is too large and excessive dependence on the special database,an approximate quantization optimization algorithm is proposed based on this paper.The algorithm first approximates the value of the elements in the weight matrix of neural network,and extracts all the values in the weight matrix after approximation as the quantization codebook;secondly,further compresses the codebook according to the value range of the elements in the codebook of each layer of the network and the law between the positive and negative values of the values;finally,the 32-bit floating-point number in the weight matrix can be used as the 8-bit fixed-point number instead,the size of the network model is reduced to 1 / 4 of the original size.The experimental results show that compared with the widely used scalar quantization,this method reduces the extra computation and codebook redundancy,and almost realizes the lossless compression of neural network.After quantifying the neural network,this paper further compresses the neural network through network sparseness,which often causes the irregular structure of the bottom weight matrix,which makes it difficult for the network to efficiently realize the bottom matrix operation and data access.In this paper,the forward derivation of sparse neural network is regarded as sparse matrix multiplication.In the optimization,only non-zero elements are retained in the sparse weight matrix,and row by row compression is carried out.At the same time,an iterative multiple number partition algorithm is proposed,in which the length of the compressed row is regarded as the integer to be grouped,and the optimal grouping scheme obtained by the grouping algorithm is applied to the compression and reorganization to realize the sparse matrix rules Layout,then optimize sparse matrix multiplication,improve network performance.The experimental results show that under the same set of network parameters,the algorithm proposed in this paper can improve the computation speed of sparse matrix multiplication by about 30% compared with the sparse matrix multiplication realized by using the curve library function.

Keywords/Search Tags:

Resource constrained, Network Quantization, Quantization Codebook, Sparse Network, Number Partition

PDF Full Text Request

Related items

1	Study On Vector Quantization Codebook Design Algorithm Based On Neural Network
2	Research On Vector Quantization Codebook Design And Application
3	Research On Image Feature Retrieval Based On Multi-codebook Quantization
4	The Vector Quantization Codebook Design And Vq-Based Digital Watermarking Embedding Technology Study
5	Quantization For Approximate Nearest Neighbor Search
6	Study Of Vector Quantization Codebook Design Algorithms
7	Vector Quantization And Its Application To Image Processing
8	Research On Vector Quantization Technology Of Low Bit Vocoder Based On MELP
9	Study On Quantization Methods Of Physical Layer Secret Key Generation
10	The Application Of SOFM And Direct Vector Quantization To LD-CELP Speech Coding Algorithm