The Training System Of Fault-tolerant Neural Network Model Based On CPU-FPGA

Posted on:2020-02-28

Degree:Master

Type:Thesis

Country:China

Candidate:K Z Xing

Full Text:PDF

GTID:2428330578459475

Subject:Electronic Science and Technology

Abstract/Summary:

PDF Full Text Request

The hardware and frameworks of IoT have been launched by the giant company,such as Google,ARM and NVidia.The combination of deep learning accelerators and IoT applications will be a new boost to technological innovation in the world.However,there will be a huge gap between the design of object system and the excepted product because of the actual working environment of the object system is often extremely demanding.The accuracy of originally trained model may reduce or even failure in the actual working environment.Therefore,we propose a CNN retraining method based on the heterogeneous platform of CPU and FPGA,which the forward propagation algorithm is implemented on the FPGA and the back propagation algorithm is implemented on the CPU.Both of CPU and FPGA are combined to complete the training of CNN.The undeterministic behavior,such as work environment deviation,can be learned by the CNN model through retraining.Therefore,the accuracy of the neural network model can be improve in the actual working environment.Finally,we applied this method to the approximate calculations and soft errors scenarios.Experiments have shown that the accuracy of top5 and top1 of the neural network can be improved by 5.7% and 8.6% respectively,compared to the model of offline training.In addition,we apply overclocking technology to further improve the performance of CNN accelerators.However,in the overclocked state,the prediction accuracy of CNN accelerator may reduce and even a crash because of the timing violations in the critical path.Therefore,according to inserting additional reference pictures,we determine the loss of precision induced by the overclocking.The loss of precision are divided into four states: small precision loss,medium precision loss,severe precision loss,and accelerator crash.Small precision loss has no impact on the application,CNN accelerator can be used normally;for medium precision loss,we use retraining method to improve accuracy;for serious precision loss and accelerator crash state,we propose checkpoint strategy to recover.The main idea of the checkpoint strategy is that degrading the frequency of the CNN accelerator to repeatedly perform the task of classification image.The experiment results show that the optimization method can still improve the performance of CNN accelerator by 20% and reduce energy consumption by 30% when the accuracy loss is less than 2%.

Keywords/Search Tags:

CNN accelerator, The retrain of CNN, Fault tolerance

PDF Full Text Request

Related items

1	A High-reliability Deep Neural Network Accelerator With Hybrid Architecture
2	Research On Adaption Method Of Cloud Fault Tolerance Services Based On User Requirement And Resource Constriction
3	Study On Fault-Tolerance Mechanism And Realization In Real-Time Distributed Computer Systems
4	The Research On Improving The Fault Tolerance Capability Of Programs In Radiation Environment
5	The Design Of Autonomous Opportunistic Protection Mechanism In Neural Network Accelerator Architecture
6	Damage Analysis And Control Of Neural Network Accelerator Under Space Irradiation
7	Study And Realization Of Fault-tolerant Technology For Parallel Computer On Satellite
8	Research On Improving 3D IC Yield Based On TSV Fault-Tolerance
9	Research On Fault-Tolerance Technology For Message-Passing System
10	Research On TSVs Fault-Tolerance In3D ICs