Font Size: a A A

Research On Soft Error Detection Model Based On GPGPU Platform

Posted on:2021-01-27Degree:MasterType:Thesis
Country:ChinaCandidate:R Y ZhangFull Text:PDF
GTID:2428330629452671Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
GPGPUs(General-purpose Graphics Processing Units)has become an indispensable computing unit in many scientific fields because of its high concurrency and high throughput.The high computing power is due to the hundreds of highly integrated computing cores on the GPUs chip,which are separate cells on which computing tasks are distributed for highly parallel computing to improve performance.With the reduction of chip transistor size and the improvement of chip integration,making the highly integrated GPGPU platform more vulnerable to the collision of high-energy particles and leading to the logic unit bit-flip fault,which is also known as soft errors.Soft errors that occur in the underlying chip can easily affect applications running in the upper layer,and Silent Data Corruption(SDC)errors of which cannot be captured by the symptom-based detection mechanism because it will affect the accuracy of the output and will not generate abnormal information during operation.In order to ensure the reliable execution of GPGPU application,it is very important to propose an effective detection mechanism for SDC.Full instruction redundancy technique is the most effective and intuitive method to detect SDCs.It can detect errors by comparing the operation results of instruction copy and original instruction.However,it will generate excessive space and computing overhead because of the redundant execution of instructions,which will seriously affect the parallel processing performance of GPGPU platform.For the problem of high cost of instruction redundancy,the existing research uses fault injection to identify the instrutions that are more likely to produce SDCs in the application,so as to realize the selective instruction redundancy detection.In order to analyze the instructions with high SDC tendency in the application and ensure that the analysis results reach a certain degree of confidence,it is necessary to conduct a large number of fault injection for all possible wrong positions in the program,this process will be time consuming.In this paper,aiming at the high cost of SDC error detection in GPGPU applications,the support vector machine-based SDC error detection model is proposed.This SDC detection model can detect most SDC errors in the program on the promise of a small amount of fault injection.The instruction SDC tendency is obtained by fault injection of part of instructions in the program.The context information of the instruction,such as the instruction type,the function where the instruction resides and the subsequent instruction that it affects,is fully extracted.These data are used as the training set,we predict the instruction SDC tendency without injection fault by using support vector machine to present the correlation between the above characteristics and the instruction SDC tendency.On the basis of the above research,considering that some GPGPU programs can tolerate certain errors,some SDCs with less severity can be regarded as acceptable results.Therefore,this paper deals with different fault types according to the influence of errors on program operation.Quantify the difference between SDCs and accurate output and ignore SDCs within the of the target output quality,so as to reduce the possibility of the instruction being defined as SDC tendency instruction,and thus reduce the instruction requiring selective redundancy.Finally,we apply the model studied above to the actual GPGPU platform.The final results show that the SDC detection model proposed in this paper can achieve the accurate prediction of 89% SDC tendency instruction only by fault injection of 20% instruction in the program,which significantly reduces the analysis cost of fault injection.By using the detection model in this paper,89.7% error coverage of SDC can be achieved through redundancy detection of 50% instructions in the program.In addition,our scheme further reduces the overhead of instruction redundancy on the original basis by tolerating some SDCs with less severity.
Keywords/Search Tags:GPGPU, Soft Error, Fault Injection, Instruction Redundancy, Support Vector Machine
PDF Full Text Request
Related items