Font Size: a A A

Research On Optimization And Acceleration Methods Of Deep Neural Network Models For Hardware Implementation

Posted on:2022-07-26Degree:MasterType:Thesis
Country:ChinaCandidate:K ChenFull Text:PDF
GTID:2518306725479824Subject:IC Engineering
Abstract/Summary:PDF Full Text Request
In the past decade,deep neural networks(DNNs),as the most prominent technique,have developed rapidly in numerous real-world situations,which has promoted various industrial innovation and transformation,and thus has integrated artificial intelligence more into people's lives.Due to the increase in data,the improvement of computing power,the innovation of algorithms,and the popularity of open-source frameworks,DNNs have seen explosive growth in applications.DNNs are applied to ubiquitous applications,such as intelligent robots,autonomous vehicles,computer vision,speech processing,and many other domains.In these scenarios,DNNs tend to provide SOTA performance that exceeds human accuracy.However,the excellent performance of DNNs comes at the cost of high computational complexity,and given the huge parameter size of most DNNs models,DNNs are undoubtedly costly in terms of energy consumption.With the increase of application scenarios,DNNs processing platform has a trend towards customized hardware accelerators.More and more DNNs need to be deployed on edge devices with small memory and limited computing resources,such as mobile phones,which requires DNNs to operate in a more energy-efficient way.In this case,efficient processing of DNNs to flexibly achieve high performance,without sacrificing accuracy in the corresponding scenario,is vital for the deployment of DNNs.In addition,while open source deep-learning frameworks are prevailing in constructing,training,and deploying DNNs on general-purpose computing devices,there is currently no universal framework that is capable of optimizing DNN deployment and instructing hardware design in the field of custom-designed accelerators.Therefore,this paper focuses on the efficient processing of DNNs,utilizes the idea of hardware/algorithm co-design,takes the existing frameworks as the starting point of model optimization,and presents a novel solution suitable for hardware inference.In this paper,we propose a cross-framework DNNS optimization framework for efficient hardware inference.First,we leverage ONNX,an open-source format,as an intermediate representation to convert models from other frameworks into ONNX format,and incorporates customized options such as layer-wise operator fusion in the process.In this way,it not only wipes the obstacle of cross-framework in deploying DNNs on various edge computing devices,but also significantly reduces the overall complexity.Based on ONNX models,we use dynamic fixed-point(DFP)quantization of various parameters in DNNs to reduce data precision and thus reduce data movement and storage overhead on hardware.With a few unlabeled data,our framework can perform statistical analysis on the weights and activations of the DNNs inference.And we further propose the weight equivalent transformation based on statistical information to optimize weight/activation dynamic ranges to improve the quantization accuracy,which does not require retraining to adjust the hyperparameters.Meanwhile,considering DNNs' great success in computer vision,we explore the performance of several object detection and image classification models under our framework and all calculations are based on DFP computation.Furthermore,based on DFP hardware inference,we extensively study the performance of several DNNs under various bit width quantization strategies and perform detailed experiments on weight-constrained quantization and activation-constrained quantization to identify the optimal solutions and guide hardware accelerator design parameter choices.Besides,via a ready-to-use DNNs compiler,the results of model optimization,including the updated ONNX file and scale information,can be compiled to generate binary files containing crucial information required by hardware inference.Moreover,to further optimize the hardware inference of DNNs,we propose a baseline reference design of a reconfigurable hardware accelerator to achieve an efficient and accurate hardware inference.Activation-constrained quantization experiments show that,compared with FP-32 accuracy,the 12-bit object detection models have a maximum mAP loss of1.5%,while the Top-1 accuracy loss of 8-bit classification models is no more than 1.2%.
Keywords/Search Tags:deep learning, deep neural networks(DNNs), hardware accelerator architecture, hardware/algorithm co-design, quantization, DNNs optimization, ONNX
PDF Full Text Request
Related items