Research On DNN Inference Optimization Technology For Resource-Constrained Devices

Posted on:2024-09-10

Degree:Master

Type:Thesis

Country:China

Candidate:N Meng

Full Text:PDF

GTID:2568307172496744

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of Internet of Things(Io T),more and more devices and sensors are used to collect real-time data.The growth of data and diversity of application scenarios lead to the problem of real-time and latency becoming more and more prominent.Deep learning has the characteristics of scalability and high-level representation capabilities,so it has excellent performance in processing large-scale data and is widely used in various fields.However,the deep neural network(DNN)has numerous parameters,and its computational overhead and memory requirements are very high,making it difficult to deploy on resource-constrained devices.In addition,compared with advanced edge devices,the microcontroller(Microcontroller Unit,MCU)closer to data source has more limited resources in terms of storage and computing.Therefore,the deployment and computation of DNN models on MCU is a huge challenge.Specifically,on the one hand,in scenarios where only a single MCU is used for model inference,using existing compression methods to compress DNN at high ratio results in significant loss of accuracy.Therefore,how to significantly compress the model and minimize the loss of accuracy while meeting the resource constraints is a problem to be optimized in the current scheme.On the other hand,since the computing power and processing speed of a single device may not meet inference requirements of high-performance applications,multiple devices can be used in conjunction with distributed computing technology for collaborative inference.However,current multi-machine collaborative inference schemes usually require all devices to store the complete model,which obviously cannot be applied to collaborative inference systems with MCU participation.In this scenario,how to partition the DNN model according to the resource heterogeneity of devices and network,and how to improve the communication efficiency between devices are key issues that need to be addressed urgently.In order to address the mentioned problems and challenges,this thesis studies model inference optimization technology on resource-constrained devices.The main work and innovation points are as follows:(1)In scenarios where only a single MCU-based edge device is used for model inference,aiming at the problem of accuracy loss caused by large compression,this thesis proposes a DNN optimization method based on model compression techniques.On the one hand,this thesis proposes to use pruning methods for significant compression,and designs a pruning-oriented knowledge distillation method.The important middle layer feature information is extracted from the complex model to train compressed model.On the other hand,since floating-point parameters need to be converted to fixed-point integers when deploying models on resource-constrained devices,this thesis designs a two-stage distillation method in conjunction with quantization-aware training.The experimental results demonstrate that the proposed method can realize a compression rate as high as 97%,improving the inference speed while achieving inference accuracy.(2)In scenarios of using multiple edge devices including MCU for collaborative inference,to address the problem that some devices cannot deploy complete model,this thesis proposes a model partition-based multi-machine collaborative inference algorithm.First,a new layer fusion partitioning method is proposed to reduce the frequency of data transmission between devices.Next,a communication data compression strategy is proposed in combination with quantization techniques,and then an inference time model is constructed as the basis for model partitioning,which is applied to the proposed neural network layer partitioning method.Finally,considering the characteristics of the neural network layers,computational capacity of the edge devices and network environment resources,this thesis proposes a model partitioning algorithm to divide DNN model into multiple sub-models.The experimental results show that proposed method can achieve the speedup ratio of 1.56× ～ 2.78× with 2 ～ 5 edge devices without any loss in accuracy,which is superior to other schemes.

Keywords/Search Tags:

Deep Neural Network, Model Compression, Knowledge Distillation, Model Quantization, Multi-edge Collaboration, Model Partition

PDF Full Text Request

Related items

1	Research On Collaborative Inference Optimization Technology For DNN In Edge Computing
2	Research On Model Compression Method Of Deep Convolution Neural Network
3	Research On Model Simplification Based On High-Precision Deep Learning
4	Research And Implementation Of Compression Algorithm For Multilevel Knowledge Distillation Model Based On Feature Map
5	Research And Application Of Model Compression Algorithm Based On Pruning-quantization-knowledge Distillation
6	Research Of Model Compression Method Based On Quantized Convolutional Neural Network
7	Compression Method Of Deep Neural Network Model For Speech Enhancement
8	Research On Model Compression Scheme For Convolutional Neural Network
9	Exploring Knowledge Distillation And Dynamic Network In Deep Model Compression
10	Research On Convolutional Neural Network Compression Algorithm And Application Based On Knowledge Distillation