Design And Implementation Of The Latency-Aware Automated Model Pruning Technology

Posted on:2023-10-07

Degree:Master

Type:Thesis

Country:China

Candidate:Z L Chen

Full Text:PDF

GTID:2558307097994959

Subject:Computer technology

Abstract/Summary:

In recent years,the continuous development of deep neural network(DNN)has made various artificial intelligence applications shine in different fields,such as autonomous driving and smart home.However,powerful DNNs are often accompanied by a large number of parameters,making them unable to be effectively and efficiently deployed in resource-constrained devices.Therefore,how to effectively reduce the model size and computational resource consumption of DNNs while maintaining their performance has become an urgent challenge.Model pruning aims to safely remove unimportant connections in neural networks at a small cost of accuracy,and is widely used to compress and accelerate convolutional neural networks(CNNs).Conventional pruning techniques only consider the different accuracy sensitivity between layers but ignore their different latency sensitivity during investigating layer sparsity.One primary problem with this is that an expensive pruning-selecting exploration process is needed to find the high-accuracy and low-latency model.Moreover,prior art in filter pruning applies the static characteristics of the network to determine the filter importance and guide pruning.However,this may result in an inaccurate filter selection and serious accuracy loss.In order to solve the above problems,a latency-aware automated model pruning technology is proposed in this paper,the main components of this technology are shown below:(1)This technology consists of a latency-aware automated framework,which leverages the reinforcement learning to automatically determine the layer sparsity.Latency sensitivity is proposed as a prior knowledge and involved into the exploration loop.Rather than relying on a single reward signal such as validation accuracy or floating-point operations(FLOPs),the agent receives the feedback on the accuracy error and latency sensitivity.Therefore,substructures with better model accuracy and delay can be searched.(2)Moreover,a novel intra-layer filter pruning algorithm is also provided in this technology,which can accurately distinguish the important filters within a layer based on their dynamic changes.The principle behind this algorithm is that more active filters have stronger adaptability to the incomplete network and can compensate for the representation capability of pruned filters.A newly proposed filter regeneration strategy is also included in the algorithm.This algorithm enables more precise intra-layer filter pruning.Compared to the state-of-the-art handcrafted and automated compression policies,this technology demonstrates superior performances for VGGNet,Res Net,and Mobile Net on datasets of CIFAR-10,Image Net,and Food-101.This technology allows the inference latency of Mobile Net-V1 to achieve approximately 1.64 times speedup on the Titan RTX GPU,with no loss of Image Net Top-1 accuracy.It significantly improves the pareto optimal curve on the accuracy and latency trade-off.

Keywords/Search Tags:

Artificial Intelligence, Deep Learning, Model Compression and Acceleration, Reinforcement Learning

Related items

1	Research And Application Of Game Artificial Intelligence System Based On Machine Learning Methods
2	A Research Of Deep Reinforcement Learning Algorithms In Combination With Multi-relations
3	Research On Multi-Agent Deep Reinforcement Learning Methods And Applications
4	Research On Automated Structure Optimization Technology Of Edge Intelligence Models Based On Deep Reinforcement Learning
5	High Performance Artificial Intelligence Computing With Algorithm-hardware Co-design
6	Reinforcement Learning Agent Design Based On Deep Perception And Imitation Learning
7	Research On Command Decision Method From RTS Perspective On Deep Learning
8	Model Compression And Forward Acceleration Based On Embedded Deep Neural Network
9	Deep Neural Networks Compression And Acceleration Based On AutoML
10	Research On Chess Game Based On Deep Reinforcement Learning