Font Size: a A A

Compression And Acceleration Of Recurrent Neural Network Model

Posted on:2021-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:X Y LiFull Text:PDF
GTID:2428330614468306Subject:Engineering
Abstract/Summary:PDF Full Text Request
Recurrent neural networks(RNNs)and their variants,such as long-short-term memory networks(LSTMs),have the ability to process sequential data,greatly improving the accuracy of tasks such as speech recognition,natural language processing,and machine translation.However,the parameters of recurrent neural networks are huge,and the reasoning is alternated.However,edge devices have limited computing resources and storage space,making it difficult to handle large neural networks.At the same time,for applications on servers with a large number of concurrent requests,inferred edge device storage,rapid offline application of recurrent neural network models still presents great difficulties.Model compression is a technology that explores the key structure of a model.It can effectively scale the model and increase the speed of inference.It is the key technology to effectively deploy neural network models on edge devices.There have been many results for the compression of convolutional neural network models.Among the few existing methods for recurrent neural network model compression,the pruning only considers the importance of the weights and weights within the single loop layer of the model.In this way,the method exists Several problems: First,the existing pruning rules of recurrent neural network models only consider the relative importance of weights in the cyclic layer,and there is no duplication of scale parameters in the entire network.,Only consider the rules of weights and weights,ignore the similarity of cyclic units,and do not eliminate the redundancy between cyclic units.In view of the above problems,in this paper,a pruning method based on the combination of L2 norm and similarity is proposed to accelerate and accelerate the recurrent neural network model.This method normalizes the weight of each layer of the recurrent neural network.There is a high degree of trade-off in the process network.At the same time,the method considers the similarity between cyclic units and further eliminates the redundancy between cyclic units.In addition,the existing pruning methods are based on the implemented permissions and are decomposed according to rules.Some experiments in the latest research show that the effectiveness of sub-networks is structurally rather than pre-training weights.Based on this research,we propose a model compression method based on the exploration of the optimal initialization structure.Only the sub-network structure is considered during compression,thereby relying on the weight of the pre-trained model.In proportion to the existing methods in the field,a new compression rule is proposed in this paper,considering the overlap between the series units,and heuristically exploring the optimal initialization structure of the recurrent neural network,thereby reducing the dependence of pruning on weights.Experiments have been carried out in the article,and good results have been achieved.
Keywords/Search Tags:Model compression, structured pruning, recurrent neural network
PDF Full Text Request
Related items