Font Size: a A A

The Training System Of Fault-tolerant Neural Network Model Based On CPU-FPGA

Posted on:2020-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:K Z XingFull Text:PDF
GTID:2428330578459475Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
The hardware and frameworks of IoT have been launched by the giant company,such as Google,ARM and NVidia.The combination of deep learning accelerators and IoT applications will be a new boost to technological innovation in the world.However,there will be a huge gap between the design of object system and the excepted product because of the actual working environment of the object system is often extremely demanding.The accuracy of originally trained model may reduce or even failure in the actual working environment.Therefore,we propose a CNN retraining method based on the heterogeneous platform of CPU and FPGA,which the forward propagation algorithm is implemented on the FPGA and the back propagation algorithm is implemented on the CPU.Both of CPU and FPGA are combined to complete the training of CNN.The undeterministic behavior,such as work environment deviation,can be learned by the CNN model through retraining.Therefore,the accuracy of the neural network model can be improve in the actual working environment.Finally,we applied this method to the approximate calculations and soft errors scenarios.Experiments have shown that the accuracy of top5 and top1 of the neural network can be improved by 5.7% and 8.6% respectively,compared to the model of offline training.In addition,we apply overclocking technology to further improve the performance of CNN accelerators.However,in the overclocked state,the prediction accuracy of CNN accelerator may reduce and even a crash because of the timing violations in the critical path.Therefore,according to inserting additional reference pictures,we determine the loss of precision induced by the overclocking.The loss of precision are divided into four states: small precision loss,medium precision loss,severe precision loss,and accelerator crash.Small precision loss has no impact on the application,CNN accelerator can be used normally;for medium precision loss,we use retraining method to improve accuracy;for serious precision loss and accelerator crash state,we propose checkpoint strategy to recover.The main idea of the checkpoint strategy is that degrading the frequency of the CNN accelerator to repeatedly perform the task of classification image.The experiment results show that the optimization method can still improve the performance of CNN accelerator by 20% and reduce energy consumption by 30% when the accuracy loss is less than 2%.
Keywords/Search Tags:CNN accelerator, The retrain of CNN, Fault tolerance
PDF Full Text Request
Related items