Font Size: a A A

Research On Energy Efficient Instruction Duplication Method For GPGPU Platform

Posted on:2022-11-25Degree:MasterType:Thesis
Country:ChinaCandidate:N JiangFull Text:PDF
GTID:2518306758991499Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
General purpose Graphics Processing Units(GPGPUs)are widely used in highperformance computing centers due to their high concurrency,high throughput,and increasing programmability.The development of manufacturing process leads to shrinking chip size and higher integration,which increases the probability of soft errors caused by high-energy particle impact on GPGPU platforms.Soft errors can cause programs to generate silent data corruptions(SDCs),which are the most difficult type of errors to detect because they affect the accuracy of output,but there is no clear indication that an exception occurred during program execution.Therefore,in order to ensure the reliable execution of GPGPU programs,it is urgent to propose an effective SDC detection method.Full instruction duplication is an effective method to detect SDC,which judges whether there is a soft error by redundant execution of instructions.However,implementing full instruction duplication on GPGPU faces the following challenges.For each thread,instruction duplication increases the number of instructions it executes,thereby extending program execution time.In addition,instruction duplication will occupy additional registers to store copy data,so the register resources required by a single thread will increase,which will affect the concurrency of the thread,and thus affect the performance of the program.In this regard,the existing work proposes the idea of selective instruction copying,that is,selectively protecting the instructions in the program that are prone to SDC.Existing work usually adopts the fault injection method to analyze the soft error resilience of instructions,so as to select the instructions with higher SDC tendency for protection.However,in order to accurately judge the SDC tendency of an instruction,detailed fault injection is often required,and this process is usually very time-consuming.In addition,previous work has often provided over-protection to the program.Considering that some applications have a certain tolerance for minor SDC errors,the instruction protection conditions can be further relaxed according to the user's precision requirements,that is,only instructions that are prone to serious SDC errors are protected.Focusing on the problems and challenges of implementing instruction replication on GPGPU,this paper proposes an energyefficient selective instruction replication method.The contributions of this work are as follows:1.A machine learning model-based SDC vulnerability instruction prediction method is proposed to efficiently identify the instructions in the program that need to be protected.We found that the attributes of the instruction itself,the function of the instruction,and the features in the error propagation process can effectively characterize the SDC proneness of the instruction.We then use a machine learning classifier to explore the relationship between these features and the SDC proneness of the instruction.By injecting faults into a small number of instructions and providing a training set for the machine learning model,all SDC vulnerable instructions in the program can be predicted with low time overhead.2.Considering that some applications have a certain tolerance for benign SDC errors,the protection of these SDCs can be relaxed,and the reliability overhead can be further reduced within a reasonable range.This paper proposes a heuristic feature-based SDC severity instruction identification method.We find that the SDC severity of an instruction is related to the initial value of corrupted data,the propagation range of soft errors,and whether it can produce detectable symptoms.Using these heuristic features and the idea of decision tree,a method for judging the severity of instruction SDC is proposed.3.Based on the above work,a selective instruction duplication mechanism on GPGPU is proposed.We determine the protected instruction set according to the proposed SDC vulnerability and severity instruction identification model,and accordingly deploy the copy instruction on the intermediate file of the GPGPU compilation process.We design the consistency judgment module and soft error processing module.Finally,we complete the instruction duplication process during the execution of the GPGPU program.This paper selects benchmarks that are commonly used in related research to evaluate the performance of the proposed method.The experimental results show that instruction duplication based on SDC vulnerability and SDC severity instructions achieves good SDC prediction accuracy,and can detect 90.5% SDC and 86.2% severity SDC in the program,respectively,resulting in time overhead of 1.4 times and 1.29 times respectively.
Keywords/Search Tags:GPGPU, Soft Error, Fault Injection, Silent Data Corruption, Instruction Duplication
PDF Full Text Request
Related items