Font Size: a A A

On Improving Computing Performance Of DNNs Under Resource Constraints

Posted on:2021-10-30Degree:MasterType:Thesis
Country:ChinaCandidate:W C WangFull Text:PDF
GTID:2518306452463444Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of deep learning and artificial intelligence applications,the demand for deploying neural networks based on resource constrained platforms such as mobile terminals increases dramatically.However,the computing resources and storage resources of such platforms are relatively limited,so it is difficult to configure large-scale convolution or deep neural networks.In this paper,neural network compression algorithm is introduced to solve this kind of problem.By "Quantization + Sparseness" compression network,the amount of network parameters and the amount of computation needed to run the network are greatly reduced,so as to realize the effective deployment of large-scale neural network on the resource constrained platform.However,there are some limitations in the traditional neural network quantification and sparseness,such as the difficulty in maintaining the network rule structure and over dependence on the custom accelerator,which lead to the difficulty in efficient deployment of neural network.In order to solve the above problems,this paper studies from two aspects.Firstly,combined with the idea of neural network quantization optimization,a neural network compression algorithm based on approximate quantization of weight matrix is proposed to reduce the storage requirements of large-scale neural network for hardware and realize the compression and storage of neural network.Secondly,the quantized network is thinned and the neural network is thinned The irregular network structure caused by thinning is optimized to realize the compression and reorganization of the bottom weight matrix,and then the matrix multiplication optimization algorithm is designed to achieve efficient network operation and data access.In this paper,firstly,the neural network is quantized.Because of the shortcomings of traditional neural network quantization,such as the extra computation generated by quantization is too large and excessive dependence on the special database,an approximate quantization optimization algorithm is proposed based on this paper.The algorithm first approximates the value of the elements in the weight matrix of neural network,and extracts all the values in the weight matrix after approximation as the quantization codebook;secondly,further compresses the codebook according to the value range of the elements in the codebook of each layer of the network and the law between the positive and negative values of the values;finally,the 32-bit floating-point number in the weight matrix can be used as the 8-bit fixed-point number instead,the size of the network model is reduced to 1 / 4 of the original size.The experimental results show that compared with the widely used scalar quantization,this method reduces the extra computation and codebook redundancy,and almost realizes the lossless compression of neural network.After quantifying the neural network,this paper further compresses the neural network through network sparseness,which often causes the irregular structure of the bottom weight matrix,which makes it difficult for the network to efficiently realize the bottom matrix operation and data access.In this paper,the forward derivation of sparse neural network is regarded as sparse matrix multiplication.In the optimization,only non-zero elements are retained in the sparse weight matrix,and row by row compression is carried out.At the same time,an iterative multiple number partition algorithm is proposed,in which the length of the compressed row is regarded as the integer to be grouped,and the optimal grouping scheme obtained by the grouping algorithm is applied to the compression and reorganization to realize the sparse matrix rules Layout,then optimize sparse matrix multiplication,improve network performance.The experimental results show that under the same set of network parameters,the algorithm proposed in this paper can improve the computation speed of sparse matrix multiplication by about 30% compared with the sparse matrix multiplication realized by using the curve library function.
Keywords/Search Tags:Resource constrained, Network Quantization, Quantization Codebook, Sparse Network, Number Partition
PDF Full Text Request
Related items