| In recent years,convolutional neural networks(CNNs)have been widely used in the field of computational vision and have achieved great success.In order to get high accuracy,the model size of CNN becomes larger and larger.It is difficult to directly deploy large CNNs to resource-limited mobile devices and meet the requirements of low power consumption and real-time performance.Network lightweight is the premise for deep neural networks to be deployed on edge devices.Designing a more lightweight network from the algorithm level can reduce the amount of computation and parameters,thereby reducing the demand for hardware resources and accelerating inference speed.It is significant for research lightweight CNNs.The networks need a real hardware platform.The current general-purpose processors have natural deficiencies in processing CNNs.The special hardware accelerator needs to be studied.Therefore,lightweight CNNs and the dedicated hardware accelerator are studied in this thesis.The main work is as follows:(1)Depthwise separable convolution plays an important role in lightweight CNNs.The roof model is used to analyze the characteristics of depthwise separable convolutional neural networks.Then a shared computing unit for depthwise and pointwise convolutions,channel enhancement approach with high computing resource utilization,regulable parallelism for different kinds of pointwise convolutions,feature map storage optimization for efficient memory access,and efficient workflow for preloading input images are proposed.For the SkyNet,the proposed accelerator can achieve 80.03 FPS,the throughput of each DSP reaches0.207 GOPS,and the energy consumed for detecting each picture is 0.072 J.The performance of the proposed accelerator is better than the champion of DAC-SDC’2020.(2)In order to improve the imbalance of the parameter ratio between the depthwise convolution for extracting spatial features and the pointwise convolution for fusing channel information in the depthwise separable convolutional neural network,some approaches are proposed from the convolutional block granularity and convolution layer granularity,respectively.At the convolution block granularity,an efficient DPD convolution block composed of two depthwise layers and one pointwise layer is designed,and then a pyramid DPD convolution block with multi-level spatial feature extraction capability is proposed.PydDPDNet with similar parameters is about 1% more accurate than MobileNetV2.At the convolutional layer granularity,overlapped group convolution(OGC)with good information transformation in channel dimension and efficient shared kernel sliding on channel convolution(SKC)are proposed.Then an extremely efficient depthwise separable convolution composed of depthwise convolution and SKC is proposed to replace the standard convolution for compressing the network and balancing the spatial feature extraction ability and channel information fusion ability of the network.It can help CNNs to be deployed on the edge devices.(3)In order to give full play to the lightweight advantages of XDSC,Winograd algorithm is used to simplify XDSC and a hardware accelerator is designed.In the algorithm,it is presented that how to decompose a large kernel to small kernels,and a hardware-friendly VGG16-XDSC that can be accelerated by the unified Winograd convolution operator is designed.In the hardware,a series of optimization methods to accelerate XDSC calculation are proposed,including 1D Winograd and 2D Winograd shared computing engines,a merge processing unit for input transformation and corresponding element multiplication,efficient row transformation design and input feature map loading technology.By the algorithms and hardware codesign,the accelerator based on the Ultra96 V2 FPGA development board only uses 259 DSPs to achieve 34.1FPS,and the energy efficiency is 14.6GOPS/W.In summary,the lightweight of neural networks and their dedicated FPGA accelerators are studied.Firstly,the approach for accelerating depthwise separable convolution is proposed.Then the problem of parameter imbalance in the depthwise separable convolution is improved from convolution block and convolution layer.The efficient DPDNet,OGC and SKC are proposed.Finally,the Winograd algorithm is used to accelerate computing and a dedicated hardware accelerator is implemented for XDSC.The lightweight convolutional neural networks and dedicated hardware acceleration methods are proposed,which can help CNNs to be deployed in edge devices. |