Font Size: a A A

Research And Implementation On Deep Convolutional Neural Network Compression Algorithm

Posted on:2020-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:Q JiaFull Text:PDF
GTID:2428330575498585Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Deep convolutional neural networks in deep learning have become a very common technique in computer vision and artificial intelligence applications,and have achieved huge harvest in various tasks such as image classification and recognition,object detection.However,most network models are computationally expensive and memory intensive.They are difficult to deploy on portable mobile platforms such as modern smart phones,self-driving cars and other micro devices.So it is especially significant to compress model and make the network structure from intensive to sparse.Weight pruning leverages the redundancy in the number of weights in DCNNs,while weight clustering reduce the redundancy of weight representations and multiplication overhead by weights sharing.However,separate pruning or clustering can only achieve a limited compression ratio.Moreover,the current pruning lacks of effective theoretical basis,which is also complicated to manually preset the pruning rate of each layer.And the number of categories and the selection of the centroid are lack of adaptability in the clustering process.Thesis focuses on the weighting pruning and clustering algorithms as follows:(1)The global dynamic weight pruning saliency-based algorithm calculate the weight gradient information generated by all data samples during training process.The product of the normalized gradient value and the value of current weights is used as the final saliency representation,which indicates a more comprehensive measurement of how much contribution to network performance.The algorithm can automatically obtain the hierarchical pruning rate and pay full attention to the inter-network layer correlation,by setting a global pruning rate.Finally,the pruning and retraining execute iteratively to rectify the error pruning and guarantee the performance of the pruned network.The experimental results show that our pruning method can effectively reduce parameter redundancy and improve network accuracy.(2)Weight pruning for dedicated hardware acceleration circuit calculates the saliency representation and prune weight of convolution and fully-connection layer separately,which balances the pruned parameter quantities and the amounts of computation.The experimental results confirm that our pruning method significantly reduces the computation overhead,and make the pruned sparse network easily deploy the hardware platform to accelerate computation.(3)Adaptive clustering model compression algorithm consists of two steps.Firstly,all sample objects are scanned to construct clustering feature maps by BIRCH hierarchical clustering,and the numbers of cluster can be adaptively obtained to avoid manual setting.Then,K-Means++clustering algorithm is introduced to adaptively select the initial centroids of each category.The Experimental results display that the effective union of BIRCH and K-Means++in our adaptive clustering algorithm can obtain more reasonable numbers and centers of clustering,and realize more weights sharing while ensuring network accuracy,which make the whole clustering process efficient and automatic.
Keywords/Search Tags:Network compression, Dense network, Weight pruning, Weight clustering, Weight sharing, Sparse network
PDF Full Text Request
Related items