| In recent years,Convolutional Neural Network(CNN)has been widely used as an essential deep learning model in speech recognition,image processing and other fields.With the explosive increase in CNN participants,traditional general-purpose computing architecture can hardly satisfy the computational demands.In this context,Domain-Specific Architecture(DSA)accelerators are becoming the mainstream computational platform for CNNs.While highly integrated CNN accelerators bring performance benefits,they also make the system vulnerable to high-energy neutrons or α-particle strikes,which leads to "soft errors".Recent studies found that soft errors in important CNN parameters can lead to catastrophic consequences such as target misidentification.Therefore,in addition to energy efficiency optimization,it is crucial to ensure the reliability of the accelerators under the influence of soft errors.Traditional fault-tolerant techniques such as Dual/Triple Modular Redundancy(DMR/TMR)and Error Correction Code(ECC)have high error detection rates but are usually accompanied by significant time and resource overheads.The structured and simple control characteristics of the CNN accelerator provide low computational latency,and the direct application of the above "errorefficient" but high-overhead fault-tolerant design to the CNN accelerator will weaken its performance advantage,which will seriously hinder the broad application of CNN in real-time and security scenarios.To effectively reduce the error tolerance overhead of CNN accelerators,this paper introduces the idea of approximate computation into the accelerator error tolerance design by exploiting the inherent error resilience of CNNs and proposes two hardware-software co-design approximate error tolerance strategies: selective error tolerance and imprecise error tolerance.Based on the discrepancy of error sensitivity among filters,selective fault tolerance reduces the error tolerance overhead by avoiding over-protection of error-robust filters;leveraging the output correlation among some filters in the same layer of the CNN,imprecise fault tolerance generates "group granularity" check value for similar filters,and minimizes the error tolerance overhead by decreasing the generation of check values.The specific fault tolerance strategy is as follows:1)According to the discrepancy of error sensitivity among filters,selective fault tolerance only redundantly executes error-sensitive filters to reduce the system fault tolerance overhead.First,to quickly evaluate the error sensitivity of filters,we propose a gradient-based error sensitivity evaluation method at the filter level,which is,on average,2364 times faster than the traditional fault injection method.Second,we perform the redundant computation of sensitive filters by "recycling" some idle computational units on the accelerator during the convolutional layer computation.Finally,we deploy the redundant filter and the original filter in adjacent computational columns to ensure real-time error detection and recovery.2)Leveraging the similarity among filters in the same layer,imprecise fault tolerance generates check values at the granularity of "filter group" granularity for group-level imprecise error detection.By loosening the error constraint,imprecise fault tolerance reduces the computational overhead of the check value while ensuring that serious errors are detected.First,to effectively reveal the output similarity among some filters,this paper adopts a mean shift clustering algorithm for the checksum group division and uses the intra-group means to generate check filters.Secondly,to ensure the serious errors of the system are effectively detected,this paper determines the check threshold of each filter by multi-input forward inference.Finally,this paper maps the filters within the same checksum group to the adjacent processing elements columns to ensure that the output value and the checksum value reach the checksum unit at the same time and reduce the checking delay.The above two approximate fault-tolerant designs reduce the system fault-tolerance overhead by exploring the error-sensitive discrepancies of the filters and the output correlation of some filters,respectively.Moreover,both selective error tolerance and imprecise error tolerance incorporate the systolic array computational characteristics.The combination of the two approximate faulttolerance techniques is further discussed in this paper: firstly,a selective fault-tolerance strategy is implemented to perform exact redundancy checks for error-sensitive filters in each layer to ensure their reliability;secondly,inexact checks are performed for insensitive filters to avoid serious errors.Taking CNN models of different sizes as baseline(including Alex Net,Res Net-20,VGG-16,and Res Net-50),we evaluated the error coverage and execution time of the proposed approximate fault-tolerant design by performing simulations on a systolic array based CNN accelerator.The results show that the proposed selective fault tolerance policy can cover 96.40% of errors with 24.40 %additional performance overhead on average,and the imprecise fault tolerance can cover 85.24% of errors with 18.43% additional performance overhead.Selective error tolerance and imprecise error tolerance reduce the performance overhead by 75.60% and 81.57%,respectively,compared with the traditional DMR fault tolerance mechanism. |